From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 0B1163858D3C for ; Mon, 8 Nov 2021 17:36:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0B1163858D3C Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-119-a4Q-OUrWOVyiyq2X15mTYQ-1; Mon, 08 Nov 2021 12:36:10 -0500 X-MC-Unique: a4Q-OUrWOVyiyq2X15mTYQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5245719251A0; Mon, 8 Nov 2021 17:36:08 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.82]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 56D3318432; Mon, 8 Nov 2021 17:36:06 +0000 (UTC) From: Florian Weimer To: "H.J. Lu" Cc: libc-alpha@sourceware.org, Oleh Derevenko , Arjan van de Ven , Andreas Schwab , Hongyu Wang , liuhongt Subject: Re: [PATCH v3] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537] References: <20211104161443.734681-1-hjl.tools@gmail.com> Date: Mon, 08 Nov 2021 18:36:04 +0100 In-Reply-To: <20211104161443.734681-1-hjl.tools@gmail.com> (H. J. Lu's message of "Thu, 4 Nov 2021 09:14:43 -0700") Message-ID: <875yt2h7nf.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2021 17:36:13 -0000 * H. J. Lu: > The fixed compiler should define __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK > to indicate that compiler will generate the check with the volatile load. > Then glibc can check __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to avoid the > extra volatile load. I don't see __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK in the patch, so this usage is confusing. It would be illuminating which code path concerns you the most. Looking at the default mutex implementation, I think we eventually end up here (in sysdeps/nptl/lowleverlock.h): /* This is an expression rather than a statement even though its value is void, so that it can be used in a comma expression or as an expression that's cast to void. */ /* The inner conditional compiles to a call to __lll_lock_wait_private if private is known at compile time to be LLL_PRIVATE, and to a call to __lll_lock_wait otherwise. */ /* If FUTEX is 0 (not acquired), set to 1 (acquired with no waiters) and return. Otherwise, ensure that it is >1 (acquired, possibly with waiters) and then block until we acquire the lock, at which point FUTEX will still be >1. The lock is always acquired on return. */ #define __lll_lock(futex, private) \ ((void) \ ({ \ int *__futex = (futex); \ if (__glibc_unlikely \ (atomic_compare_and_exchange_bool_acq (__futex, 1, 0))) \ { \ if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ __lll_lock_wait_private (__futex); \ else \ __lll_lock_wait (__futex, private); \ } \ })) #define lll_lock(futex, private) \ __lll_lock (&(futex), private) We already have an optimization that does not count the waiters (it's 0, 1, 2 only). However, the code as written is clearly written to indicate to the compiler that CAS failure is rare (the __glibc_unlikely). Inside __lll_lock_wait_private, we do another load: void __lll_lock_wait_private (int *futex) { if (atomic_load_relaxed (futex) == 2) goto futex; while (atomic_exchange_acquire (futex, 2) != 0) { futex: LIBC_PROBE (lll_lock_wait_private, 1, futex); futex_wait ((unsigned int *) futex, 2, LLL_PRIVATE); /* Wait if *futex == 2. */ } } On x86 the unconditional exchange does not use CAS. But there is still that extra load. I think we should drop the __glibc_unlikely (because we don't know if it's actually rare; in the contended case it clear is not), move the load into __lll_lock, check it before the CAS, and also pass down the result to __lll_lock_wait* (to avoid the redundant load there). It will slightly bloat the inline __lll_lock implementation, but so does putting a check into atomic_compare_and_exchange_bool_acq. Thanks, Flroian