From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-91478-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 117517 invoked by alias); 8 Apr 2018 08:33:36 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 117480 invoked by uid 89); 8 Apr 2018 08:33:32 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=H*RU:HELO, Hx-spam-relays-external:HELO
X-HELO: mga06.intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
Subject: Re: [PATCH 3/3] Mutex: Avoid useless spinning
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
 libc-alpha@sourceware.org
References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>
 <1522394093-9835-3-git-send-email-kemi.wang@intel.com>
 <44b112b8-829e-bcfd-4eac-fdd8362862ca@linaro.org>
Cc: Andi Kleen <andi.kleen@intel.com>, "Chen, Tim C" <tim.c.chen@intel.com>,
 Dave <dave.hansen@linux.intel.com>, "ying.huang" <ying.huang@intel.com>,
 aaron <aaron.lu@intel.com>, aubrey <aubrey.li@intel.com>
From: kemi <kemi.wang@intel.com>
Message-ID: <3a8129d5-b9a6-91dd-0117-f45993908997@intel.com>
Date: Sun, 08 Apr 2018 08:33:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <44b112b8-829e-bcfd-4eac-fdd8362862ca@linaro.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-SW-Source: 2018-04/txt/msg00173.txt.bz2


On 2018å¹´04æ06æ¥ 04:59, Adhemerval Zanella wrote:
> 
> 
> On 30/03/2018 04:14, Kemi Wang wrote:
>> Usually, we can't set too short time out while spinning on the lock, that
>> probably makes a thread which is trying to acquire the lock go to sleep
>> quickly, thus weakens the benefit of pthread adaptive spin lock.
>>
>> However, there is also a problem if we set the time out large in
>> case of protecting a small critical section with severe lock contention.
>> As we can see the test result in the last patch, the performance is highly
>> effected by the spin count tunables, smaller spin count, better performance
>> improvement. This is because the thread probably spins on the lock until
>> timeout in severe lock contention before going to sleep.
>>
>> In this patch, we avoid the useless spin by making the spinner sleep
>> if it fails to acquire the lock when the lock is available, as suggested
>> by Tim Chen.
>>
>> nr_threads    base       COUNT=1000(head~1)   COUNT=1000(head)
>> 1           51644585      51323778(-0.6%)	     51378551(-0.5%)
>> 2           7914789       9867343(+24.7%)	     11503559(+45.3%)
>> 7           1687620       3430504(+103.3%)	     7817383(+363.2%)
>> 14          1026555       1843458(+79.6%)	     7360883(+617.0%)
>> 28          962001        681965(-29.1%)	     5681945(+490.6%)
>> 56          883770        364879(-58.7%)	     3416068(+286.5%)
>> 112         1150589       415261(-63.9%)	     3255700(+183.0%)
> 
> As before [2], I checked the change on a 64 cores aarch64 machine, but
> differently than previous patch this one seems to show improvements:
> 
> nr_threads      base            head(SPIN_COUNT=10)  head(SPIN_COUNT=1000)
> 1               27566206        28776779 (4.206770)  28778073 (4.211078)
> 2               8498813         9129102 (6.904173)   7042975 (-20.670782)
> 7               5019434         5832195 (13.935765)  5098511 (1.550982)
> 14              4379155         6507212 (32.703053)  5200018 (15.785772)
> 28              4397464         4584480 (4.079329)   4456767 (1.330628)
> 56              4020956         3534899 (-13.750237) 4096197 (1.836850)
> 
> I would suggest you to squash both patch in only one for version 2.
> 

OK, the separation here is easy for review.

>>
>> Suggested-by: Tim Chen <tim.c.chen@intel.com>
>> Signed-off-by: Kemi Wang <kemi.wang@intel.com>
>> ---
>>  nptl/pthread_mutex_lock.c | 25 +++++++++++--------------
>>  1 file changed, 11 insertions(+), 14 deletions(-)
>>
>> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
>> index c3aca93..0faee1a 100644
>> --- a/nptl/pthread_mutex_lock.c
>> +++ b/nptl/pthread_mutex_lock.c
>> @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>>  	  int cnt = 0;
>>  	  int max_cnt = MIN (__mutex_aconf.spin_count,
>>  			mutex->__data.__spins * 2 + 100);
>> +
>> +     	/* MO read while spinning */
>>  	  do
>> -	    {
>> -		if (cnt >= max_cnt)
>> -		  {
>> -		    LLL_MUTEX_LOCK (mutex);
>> -		    break;
>> -		  }
>> -		/* MO read while spinning */
>> -		do
>> -		  {
>> -		    atomic_spin_nop ();
>> -		  }
>> -		while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
>> +    	{
>> +		  atomic_spin_nop ();
>> +		}
>> +	  while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
>>  			++cnt < max_cnt);
>> -	    }
>> -	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);
>> +	    /* Try to acquire the lock if lock is available or the spin count
>> +	     * is run out, go to sleep if fails
>> +	     */
>> +	  if (LLL_MUTEX_TRYLOCK (mutex) != 0)
>> +		  LLL_MUTEX_LOCK (mutex);
>>  
>>  	  mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
>>  	}
>>
> 
> Please fix the format issue here.
>