From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 705 invoked by alias); 5 Apr 2018 20:55:38 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 695 invoked by uid 89); 5 Apr 2018 20:55:37 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=ram, RAM X-HELO: mail-qk0-f195.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:references:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=dvC7nLgUEh9bQzBELTWjn/rJkS2pimXQgbiw858avCs=; b=Awb1+xNCn7txJcgp72b6ssGk5605A98o8SXgKbt6pfTw9sEyZhnbB0TWwkjhSHOpMY 2jRRCayDkCAlc1DXRkOd/T8rrjbzufdIYlIYlgXlKgJPaa0UFy7jTPp35P2xeWLFDtQ9 nCupwlCGAM+MEjBQk6793YhIFuJsmg2PMBcJpIYF7e6RXb386aJe2Zls/fc0kzes7s9/ 8BXoaMkYznIY7FVASRaT4lJEZGDw6EsbvLULngdC8pDotJjyE5xbRS62h5PP9z162yg7 +hMzn2bts0eFi6sLHB/Npj/4pjxm4jigAL6W5+UJb3K5M6oRwLpSwR5wbJEPTVqU6k70 wAgg== X-Gm-Message-State: ALQs6tB3JMMozG9BDfZ6/1WFkRAup2tQkRFKoThMTLTMm96TVOd+ntrW t7Nx7rxUkK1A1nLBuNyD0okRd+tRlUE= X-Google-Smtp-Source: AIpwx49dE22rHG/Ttv5OqNZH0+4EIfhguaxmmrA1nakoTs9J33x2m+qxPRyf8P9jcoqtLpPAukhqGw== X-Received: by 10.233.216.67 with SMTP id u64mr32558799qkf.137.1522961733618; Thu, 05 Apr 2018 13:55:33 -0700 (PDT) From: Adhemerval Zanella Subject: [PATCH 2/3] Mutex: Only read while spinning To: libc-alpha@sourceware.org References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> <1522394093-9835-2-git-send-email-kemi.wang@intel.com> Openpgp: preference=signencrypt Autocrypt: addr=adhemerval.zanella@linaro.org; prefer-encrypt=mutual; keydata= xsFNBFcVGkoBEADiQU2x/cBBmAVf5C2d1xgz6zCnlCefbqaflUBw4hB/bEME40QsrVzWZ5Nq 8kxkEczZzAOKkkvv4pRVLlLn/zDtFXhlcvQRJ3yFMGqzBjofucOrmdYkOGo0uCaoJKPT186L NWp53SACXguFJpnw4ODI64ziInzXQs/rUJqrFoVIlrPDmNv/LUv1OVPKz20ETjgfpg8MNwG6 iMizMefCl+RbtXbIEZ3TE/IaDT/jcOirjv96lBKrc/pAL0h/O71Kwbbp43fimW80GhjiaN2y WGByepnkAVP7FyNarhdDpJhoDmUk9yfwNuIuESaCQtfd3vgKKuo6grcKZ8bHy7IXX1XJj2X/ BgRVhVgMHAnDPFIkXtP+SiarkUaLjGzCz7XkUn4XAGDskBNfbizFqYUQCaL2FdbW3DeZqNIa nSzKAZK7Dm9+0VVSRZXP89w71Y7JUV56xL/PlOE+YKKFdEw+gQjQi0e+DZILAtFjJLoCrkEX w4LluMhYX/X8XP6/C3xW0yOZhvHYyn72sV4yJ1uyc/qz3OY32CRy+bwPzAMAkhdwcORA3JPb kPTlimhQqVgvca8m+MQ/JFZ6D+K7QPyvEv7bQ7M+IzFmTkOCwCJ3xqOD6GjX3aphk8Sr0dq3 4Awlf5xFDAG8dn8Uuutb7naGBd/fEv6t8dfkNyzj6yvc4jpVxwARAQABzUlBZGhlbWVydmFs IFphbmVsbGEgTmV0dG8gKExpbmFybyBWUE4gS2V5KSA8YWRoZW1lcnZhbC56YW5lbGxhQGxp bmFyby5vcmc+wsF3BBMBCAAhBQJXFRpKAhsDBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJ EKqx7BSnlIjv0e8P/1YOYoNkvJ+AJcNUaM5a2SA9oAKjSJ/M/EN4Id5Ow41ZJS4lUA0apSXW NjQg3VeVc2RiHab2LIB4MxdJhaWTuzfLkYnBeoy4u6njYcaoSwf3g9dSsvsl3mhtuzm6aXFH /Qsauav77enJh99tI4T+58rp0EuLhDsQbnBic/ukYNv7sQV8dy9KxA54yLnYUFqH6pfH8Lly sTVAMyi5Fg5O5/hVV+Z0Kpr+ZocC1YFJkTsNLAW5EIYSP9ftniqaVsim7MNmodv/zqK0IyDB GLLH1kjhvb5+6ySGlWbMTomt/or/uvMgulz0bRS+LUyOmlfXDdT+t38VPKBBVwFMarNuREU2 69M3a3jdTfScboDd2ck1u7l+QbaGoHZQ8ZNUrzgObltjohiIsazqkgYDQzXIMrD9H19E+8fw kCNUlXxjEgH/Kg8DlpoYJXSJCX0fjMWfXywL6ZXc2xyG/hbl5hvsLNmqDpLpc1CfKcA0BkK+ k8R57fr91mTCppSwwKJYO9T+8J+o4ho/CJnK/jBy1pWKMYJPvvrpdBCWq3MfzVpXYdahRKHI ypk8m4QlRlbOXWJ3TDd/SKNfSSrWgwRSg7XCjSlR7PNzNFXTULLB34sZhjrN6Q8NQZsZnMNs TX8nlGOVrKolnQPjKCLwCyu8PhllU8OwbSMKskcD1PSkG6h3r0AqzsFNBFcVGkoBEACgAdbR Ck+fsfOVwT8zowMiL3l9a2DP3Eeak23ifdZG+8Avb/SImpv0UMSbRfnw/N81IWwlbjkjbGTu oT37iZHLRwYUFmA8fZX0wNDNKQUUTjN6XalJmvhdz9l71H3WnE0wneEM5ahu5V1L1utUWTyh VUwzX1lwJeV3vyrNgI1kYOaeuNVvq7npNR6t6XxEpqPsNc6O77I12XELic2+36YibyqlTJIQ V1SZEbIy26AbC2zH9WqaKyGyQnr/IPbTJ2Lv0dM3RaXoVf+CeK7gB2B+w1hZummD21c1Laua +VIMPCUQ+EM8W9EtX+0iJXxI+wsztLT6vltQcm+5Q7tY+HFUucizJkAOAz98YFucwKefbkTp eKvCfCwiM1bGatZEFFKIlvJ2QNMQNiUrqJBlW9nZp/k7pbG3oStOjvawD9ZbP9e0fnlWJIsj 6c7pX354Yi7kxIk/6gREidHLLqEb/otuwt1aoMPg97iUgDV5mlNef77lWE8vxmlY0FBWIXuZ yv0XYxf1WF6dRizwFFbxvUZzIJp3spAao7jLsQj1DbD2s5+S1BW09A0mI/1DjB6EhNN+4bDB SJCOv/ReK3tFJXuj/HbyDrOdoMt8aIFbe7YFLEExHpSk+HgN05Lg5TyTro8oW7TSMTk+8a5M kzaH4UGXTTBDP/g5cfL3RFPl79ubXwARAQABwsFfBBgBCAAJBQJXFRpKAhsMAAoJEKqx7BSn lIjvI/8P/jg0jl4Tbvg3B5kT6PxJOXHYu9OoyaHLcay6Cd+ZrOd1VQQCbOcgLFbf4Yr+rE9l mYsY67AUgq2QKmVVbn9pjvGsEaz8UmfDnz5epUhDxC6yRRvY4hreMXZhPZ1pbMa6A0a/WOSt AgFj5V6Z4dXGTM/lNManr0HjXxbUYv2WfbNt3/07Db9T+GZkpUotC6iknsTA4rJi6u2ls0W9 1UIvW4o01vb4nZRCj4rni0g6eWoQCGoVDk/xFfy7ZliR5B+3Z3EWRJcQskip/QAHjbLa3pml xAZ484fVxgeESOoaeC9TiBIp0NfH8akWOI0HpBCiBD5xaCTvR7ujUWMvhsX2n881r/hNlR9g fcE6q00qHSPAEgGr1bnFv74/1vbKtjeXLCcRKk3Ulw0bY1OoDxWQr86T2fZGJ/HIZuVVBf3+ gaYJF92GXFynHnea14nFFuFgOni0Mi1zDxYH/8yGGBXvo14KWd8JOW0NJPaCDFJkdS5hu0VY 7vJwKcyHJGxsCLU+Et0mryX8qZwqibJIzu7kUJQdQDljbRPDFd/xmGUFCQiQAncSilYOcxNU EMVCXPAQTteqkvA+gNqSaK1NM9tY0eQ4iJpo+aoX8HAcn4sZzt2pfUB9vQMTBJ2d4+m/qO6+ cFTAceXmIoFsN8+gFN3i8Is3u12u8xGudcBPvpoy4OoG Message-ID: Date: Thu, 05 Apr 2018 20:55:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <1522394093-9835-2-git-send-email-kemi.wang@intel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-04/txt/msg00147.txt.bz2 On 30/03/2018 04:14, Kemi Wang wrote: > The pthread adaptive spin mutex spins on the lock for a while before going > to a sleep. While the lock is contended and we need to wait, going straight > back to LLL_MUTEX_TRYLOCK(cmpxchg) is not a good idea on many targets as > that will force expensive memory synchronization among processors and > penalize other running threads. For example, it constantly floods the > system with "read for ownership" requests, which are much more expensive to > process than a single read. Thus, we only use MO read until we observe the > lock to not be acquired anymore, as suggusted by Andi Kleen. > > Test machine: > 2-sockets Skylake paltform, 112 cores with 62G RAM > > Test case: Contended pthread adaptive spin mutex with global update > each thread of the workload does: > a) Lock the mutex (adaptive spin type) > b) Globle variable increment > c) Unlock the mutex > in a loop until timeout, and the main thread reports the total iteration > number of all the threads in one second. > > This test case is as same as Will-it-scale.pthread_mutex3 except mutex type is > modified to PTHREAD_MUTEX_ADAPTIVE_NP. > github: https://github.com/antonblanchard/will-it-scale.git > > nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) > 1 51644585 51307573(-0.7%) 51323778(-0.6%) > 2 7914789 10011301(+26.5%) 9867343(+24.7%) > 7 1687620 4224135(+150.3%) 3430504(+103.3%) > 14 1026555 3784957(+268.7%) 1843458(+79.6%) > 28 962001 2886885(+200.1%) 681965(-29.1%) > 56 883770 2740755(+210.1%) 364879(-58.7%) > 112 1150589 2707089(+135.3%) 415261(-63.9%) In pthread_mutex3 it is basically more updates in a global variable synchronized with a mutex, so if I am reading correct the benchmark, a higher value means less contention. I also assume you use the 'threads' value in this table. I checked on a 64 cores aarch64 machine to see what kind of improvement, if any; one would get with this change: nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) 1 27566206 28778254 (4.211680) 28778467 (4.212389) 2 8498813 7777589 (-9.273105) 7806043 (-8.874791) 7 5019434 2869629 (-74.915782) 3307812 (-51.744839) 14 4379155 2906255 (-50.680343) 2825041 (-55.012087) 28 4397464 3261094 (-34.846282) 3259486 (-34.912805) 56 4020956 3898386 (-3.144122) 4038505 (0.434542) So I think this change should be platform-specific. > > Suggested-by: Andi Kleen > Signed-off-by: Kemi Wang > --- > nptl/pthread_mutex_lock.c | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index 1519c14..c3aca93 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > > #ifndef lll_lock_elision > #define lll_lock_elision(lock, try_lock, private) ({ \ > @@ -124,16 +125,22 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) > if (LLL_MUTEX_TRYLOCK (mutex) != 0) > { > int cnt = 0; > - int max_cnt = MIN (MAX_ADAPTIVE_COUNT, > - mutex->__data.__spins * 2 + 10); > + int max_cnt = MIN (__mutex_aconf.spin_count, > + mutex->__data.__spins * 2 + 100); > do > { > - if (cnt++ >= max_cnt) > - { > - LLL_MUTEX_LOCK (mutex); > - break; > - } > - atomic_spin_nop (); > + if (cnt >= max_cnt) > + { > + LLL_MUTEX_LOCK (mutex); > + break; > + } > + /* MO read while spinning */ > + do > + { > + atomic_spin_nop (); > + } > + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && > + ++cnt < max_cnt); > } > while (LLL_MUTEX_TRYLOCK (mutex) != 0); > >