From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-92227-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 35486 invoked by alias); 8 May 2018 15:08:12 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 35468 invoked by uid 89); 8 May 2018 15:08:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=no version=3.3.2 spammy=Wang, 02, apology, multi-line
X-HELO: mx1.redhat.com
Subject: Re: [PATCH v2 3/3] Mutex: Optimize adaptive spin algorithm
To: kemi <kemi.wang@intel.com>,
 Adhemerval Zanella <adhemerval.zanella@linaro.org>,
 Glibc alpha <libc-alpha@sourceware.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>, Tim Chen
 <tim.c.chen@intel.com>, Andi Kleen <andi.kleen@intel.com>,
 Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
 Lu Aubrey <aubrey.li@intel.com>
References: <1524624988-29141-1-git-send-email-kemi.wang@intel.com>
 <1524624988-29141-3-git-send-email-kemi.wang@intel.com>
 <591d1f86-21e8-0a01-721b-4adff26e839b@redhat.com>
 <eb26d3c0-222e-d4d8-174d-c9905c99a76c@intel.com>
From: Florian Weimer <fweimer@redhat.com>
Message-ID: <5873b82e-97f2-dd8d-ab55-353138264517@redhat.com>
Date: Tue, 08 May 2018 15:08:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <eb26d3c0-222e-d4d8-174d-c9905c99a76c@intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SW-Source: 2018-05/txt/msg00282.txt.bz2

On 05/02/2018 01:04 PM, kemi wrote:
> 
> 
> On 2018å¹´05æ02æ¥ 16:19, Florian Weimer wrote:
>> On 04/25/2018 04:56 AM, Kemi Wang wrote:
>>> @@ -124,21 +125,24 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>>>  Â Â Â Â Â Â Â  if (LLL_MUTEX_TRYLOCK (mutex) != 0)
>>>  Â Â Â Â Â  {
>>>  Â Â Â Â Â Â Â  int cnt = 0;
>> â¦
>>> +Â Â Â Â Â  int max_cnt = MIN (__mutex_aconf.spin_count,
>>> +Â Â Â Â Â Â Â Â Â Â Â  mutex->__data.__spins * 2 + 100);
>>> +
>>> +Â Â Â Â Â  /* MO read while spinning */
>>> +Â Â Â Â Â  do
>>> +Â Â Â Â Â Â Â  {
>>> +Â Â Â Â Â Â Â Â  atomic_spin_nop ();
>>> +Â Â Â Â Â Â Â  }
>>> +Â Â Â Â Â  while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
>>> +Â Â Â Â Â Â Â Â Â Â Â  ++cnt < max_cnt);
>>> +Â Â Â Â Â Â Â  /* Try to acquire the lock if lock is available or the spin count
>>> +Â Â Â Â Â Â Â Â  * is run out, call into kernel to block if fails
>>> +Â Â Â Â Â Â Â Â  */
>>> +Â Â Â Â Â  if (LLL_MUTEX_TRYLOCK (mutex) != 0)
>>> +Â Â Â Â Â Â Â  LLL_MUTEX_LOCK (mutex);
>>>    
>> â¦
>>> +Â Â Â Â Â  mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
>>> +Â Â Â  }
>>
>> The indentation is off.Â  Comments should end with a â.Â  â (dot and two spaces).Â  Multi-line comments do not start with â*â on subsequent lines. Â We don't use braces when we can avoid them.Â  Operators such as â&&â should be on the following line when breaking up lines.
>>
> 
> Will fold these changes in next version.
> I am not familiar with glibc coding style, apologize for that.

No apology needed, it takes some time to get use to.

>> Why is the LLL_MUTEX_TRYLOCK call still needed?Â  Shouldn't be an unconditional call to LLL_MUTEX_LOCK be sufficient?
>>
> 
> The purpose of calling LLL_MUTEX_TRYLOCK here is to try to acquire the lock at user
> space without block when we observed the lock is available. Thus, in case of multiple
> spinners contending for the lock,  only one spinner can acquire the lock successfully
> and others fall into block.
> 
> I am not sure an unconditional call to LLL_MUTEX_LOCK as you mentioned here can satisfy
> this purpose.

It's what we use for the default case.  It expands to lll_lock, so it 
should try atomic_compare_and_exchange_bool_acq first and only perform a 
futex syscall in case there is contention.  So I do think that 
LLL_MUTEX_TRYLOCK is redundant here.  Perhaps manually review the 
disassembly to make sure?

> 
>> But the real question is if the old way of doing CAS in a loop is beneficial on other, non-Intel architectures.Â  You either need get broad consensus from the large SMP architectures (various aarch64 implementations, IBM POWER and Z), or somehow make this opt-in at the source level.
>>
> 
> That would be a platform-specific change and have obvious performance improvement for x86 architecture.
> And according to Adhemerval, this change could also have some improvement for arrch64 architecture.
> If you or someone else still have some concern of performance regression on other architecture, making
> this opt-in could eliminate people's worries.
> 
> "
> I checked the change on a 64 cores aarch64 machine, but
> differently than previous patch this one seems to show improvements:
> 
> nr_threads      base            head(SPIN_COUNT=10)  head(SPIN_COUNT=1000)
> 1               27566206        28776779 (4.206770)  28778073 (4.211078)
> 2               8498813         9129102 (6.904173)   7042975 (-20.670782)
> 7               5019434         5832195 (13.935765)  5098511 (1.550982)
> 14              4379155         6507212 (32.703053)  5200018 (15.785772)
> 28              4397464         4584480 (4.079329)   4456767 (1.330628)
> 56              4020956         3534899 (-13.750237) 4096197 (1.836850)
> "

Ah, nice, I had missed that.  I suppose this means we can risk enabling 
it by default.

Thanks,
Florian