From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117517 invoked by alias); 8 Apr 2018 08:33:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 117480 invoked by uid 89); 8 Apr 2018 08:33:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=H*RU:HELO, Hx-spam-relays-external:HELO X-HELO: mga06.intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 Subject: Re: [PATCH 3/3] Mutex: Avoid useless spinning To: Adhemerval Zanella , libc-alpha@sourceware.org References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> <1522394093-9835-3-git-send-email-kemi.wang@intel.com> <44b112b8-829e-bcfd-4eac-fdd8362862ca@linaro.org> Cc: Andi Kleen , "Chen, Tim C" , Dave , "ying.huang" , aaron , aubrey From: kemi Message-ID: <3a8129d5-b9a6-91dd-0117-f45993908997@intel.com> Date: Sun, 08 Apr 2018 08:33:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <44b112b8-829e-bcfd-4eac-fdd8362862ca@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-04/txt/msg00173.txt.bz2 On 2018年04月06日 04:59, Adhemerval Zanella wrote: > > > On 30/03/2018 04:14, Kemi Wang wrote: >> Usually, we can't set too short time out while spinning on the lock, that >> probably makes a thread which is trying to acquire the lock go to sleep >> quickly, thus weakens the benefit of pthread adaptive spin lock. >> >> However, there is also a problem if we set the time out large in >> case of protecting a small critical section with severe lock contention. >> As we can see the test result in the last patch, the performance is highly >> effected by the spin count tunables, smaller spin count, better performance >> improvement. This is because the thread probably spins on the lock until >> timeout in severe lock contention before going to sleep. >> >> In this patch, we avoid the useless spin by making the spinner sleep >> if it fails to acquire the lock when the lock is available, as suggested >> by Tim Chen. >> >> nr_threads base COUNT=1000(head~1) COUNT=1000(head) >> 1 51644585 51323778(-0.6%) 51378551(-0.5%) >> 2 7914789 9867343(+24.7%) 11503559(+45.3%) >> 7 1687620 3430504(+103.3%) 7817383(+363.2%) >> 14 1026555 1843458(+79.6%) 7360883(+617.0%) >> 28 962001 681965(-29.1%) 5681945(+490.6%) >> 56 883770 364879(-58.7%) 3416068(+286.5%) >> 112 1150589 415261(-63.9%) 3255700(+183.0%) > > As before [2], I checked the change on a 64 cores aarch64 machine, but > differently than previous patch this one seems to show improvements: > > nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) > 1 27566206 28776779 (4.206770) 28778073 (4.211078) > 2 8498813 9129102 (6.904173) 7042975 (-20.670782) > 7 5019434 5832195 (13.935765) 5098511 (1.550982) > 14 4379155 6507212 (32.703053) 5200018 (15.785772) > 28 4397464 4584480 (4.079329) 4456767 (1.330628) > 56 4020956 3534899 (-13.750237) 4096197 (1.836850) > > I would suggest you to squash both patch in only one for version 2. > OK, the separation here is easy for review. >> >> Suggested-by: Tim Chen >> Signed-off-by: Kemi Wang >> --- >> nptl/pthread_mutex_lock.c | 25 +++++++++++-------------- >> 1 file changed, 11 insertions(+), 14 deletions(-) >> >> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c >> index c3aca93..0faee1a 100644 >> --- a/nptl/pthread_mutex_lock.c >> +++ b/nptl/pthread_mutex_lock.c >> @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) >> int cnt = 0; >> int max_cnt = MIN (__mutex_aconf.spin_count, >> mutex->__data.__spins * 2 + 100); >> + >> + /* MO read while spinning */ >> do >> - { >> - if (cnt >= max_cnt) >> - { >> - LLL_MUTEX_LOCK (mutex); >> - break; >> - } >> - /* MO read while spinning */ >> - do >> - { >> - atomic_spin_nop (); >> - } >> - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && >> + { >> + atomic_spin_nop (); >> + } >> + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && >> ++cnt < max_cnt); >> - } >> - while (LLL_MUTEX_TRYLOCK (mutex) != 0); >> + /* Try to acquire the lock if lock is available or the spin count >> + * is run out, go to sleep if fails >> + */ >> + if (LLL_MUTEX_TRYLOCK (mutex) != 0) >> + LLL_MUTEX_LOCK (mutex); >> >> mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; >> } >> > > Please fix the format issue here. >