Re: [PATCH v2 1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: kemi <kemi.wang@intel.com>
To: Florian Weimer <fweimer@redhat.com>,
	Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	Glibc alpha <libc-alpha@sourceware.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Lu Aubrey <aubrey.li@intel.com>
Subject: Re: [PATCH v2 1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex
Date: Mon, 14 May 2018 04:06:00 -0000	[thread overview]
Message-ID: <8dbdd127-01ad-5341-1824-52bd18f1d183@intel.com> (raw)
In-Reply-To: <3b8bb7bf-68a5-4084-e4dd-bb4fe4411bef@redhat.com>



On 2018å¹´05æœˆ08æ—¥ 23:44, Florian Weimer wrote:
> On 05/02/2018 01:06 PM, kemi wrote:
>> Hi, Florian
>> Â Â Â Â  Thanks for your time to review.
>>
>> On 2018å¹´05æœˆ02æ—¥ 16:04, Florian Weimer wrote:
>>> On 04/25/2018 04:56 AM, Kemi Wang wrote:
>>>
>>>> +Â  mutex {
>>>> +Â Â Â  spin_count {
>>>> +Â Â Â Â Â  type: INT_32
>>>> +Â Â Â Â Â  minval: 0
>>>> +Â Â Â Â Â  maxval: 30000
>>>> +Â Â Â Â Â  default: 1000
>>>> +Â Â Â  }
>>>
>>> How did you come up with the default and maximum values?Â  Larger maximum values might be useful for testing boundary conditions.
>>>
>>
>> For the maximum value of spin count:
>> Please notice that mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8, and the variable *cnt* could reach
>> the value of spin count due to spinning timeout. In such case, mutex->__data.__spins is increased and could be close to *cnt*
>> (close to the value of spin count). Keeping the value of spin count less than MAX_SHORT can avoid the overflow of
>> Â  mutex->__data.__spins variable with the possible type of short.
> 
> Could you add this as a comment, please?
> 

Sure:)

>> For the default value of spin count:
>> I referred to the previous number of 100 times for trylock in the loop. When this mode is changed to read only while spinning.
>> I suppose the value could be larger because of lower overhead and latency of read compared with cmpxchg.
> 
> Ahh, makes sense.Â  Perhaps put this information into the commit message.
> 

I investigated more on the default value of spin count recently.

It's obvious that we should provide a larger default value since the spinning way would be changed from "TRYLOCK"(cmpxchg) to 
read only while spinning. But it's not a trivial issue to determine which one is the best if possible.
The latency of atomic operation and read (e.g. cmpxchg) is determined by many factors, such as the position of cache line by which
the data is owned, the state of cache line(M/E/S/I or even O), cache line transformation and etc.
And some research report[1](Fig2 in that paper) before has shown that the latency of cmpxchg is 1.5x longer than single read at
the same condition in Haswell.

So, let's set the default value of spin count as 150, and run some benchmark to test it.

What's your idea?

[1] Lesani, Mohsen, Todd Millstein, and Jens Palsberg. "Automatic Atomicity Verification for Clients of Concurrent Data Structures." 
International Conference on Computer Aided Verification. Springer, Cham, 2014.

>> Perhaps we should make the default value of spin count differently according to architecture.
> 
> Sure, or if there is just a single good choice for the tunable, just use that and remove the tunable again.Â  I guess one aspect here is to experiment with different values and see if there's a clear winner.
>

Two reasons for keeping the tunables here:
1) The overhead of instructions are architecture-specific, so, it is hard or even impossible to have a perfect default value that fits all the architecture well.
E.g. The pause instruction in the Skylake platform is 10x expensive than before.

2) There are kinds of workload which may need a different spin timeout.
I have heard many grumble of the pthread adaptive spin mutex from customers that does not work well in their practical workload. 
Let's keep a tunable here for them.

>>>> +# define TUNABLE_CALLBACK_FNDECL(__name, __type)Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +static inline voidÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +__always_inlineÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +do_set_mutex_ ## __name (__type value)Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +{Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +Â  __mutex_aconf.__name = value;Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +}Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +voidÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +TUNABLE_CALLBACK (set_mutex_ ## __name) (tunable_val_t *valp) \
>>>> +{Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +Â  __type value = (__type) (valp)->numval;Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +Â  do_set_mutex_ ## __name (value);Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  \
>>>> +}
>>>> +
>>>> +TUNABLE_CALLBACK_FNDECL (spin_count, int32_t);
>>>
>>> I'm not sure if the macro is helpful in this context.
> 
>> It is a matter of taste.
>> But, perhaps we have other mutex tunables in future.
> 
> We can still macroize the code at that point.Â  But no strong preference here.
> 

That's all right.

>>>> +void (*const __pthread_mutex_tunables_init_array []) (int, char **, char **)
>>>> +Â  __attribute__ ((section (INIT_SECTION), aligned (sizeof (void *)))) =
>>>> +{
>>>> +Â  &mutex_tunables_init
>>>> +};
>>>
>>> Can't you perform the initialization as part of overall pthread initialization?Â  This would avoid the extra relocation.
> 
>> Thanks for your suggestion. I am not sure how to do it now and will take a look at it.
> 
> The code would go into nptl/nptl-init.c.Â  It's just an idea, but I think it should be possible to make it work.
> 

thanks, will take a look at it and see whether we can get benefit.
> Thanks,
> Florian

next prev parent reply	other threads:[~2018-05-14  4:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25  2:59 Kemi Wang
2018-04-25  2:59 ` [PATCH v2 3/3] Mutex: Optimize adaptive spin algorithm Kemi Wang
2018-05-02  8:19   ` Florian Weimer
2018-05-02 11:07     ` kemi
2018-05-08 15:08       ` Florian Weimer
2018-05-14  8:12         ` kemi
2018-04-25  2:59 ` [PATCH v2 2/3] benchtests: Add pthread adaptive spin mutex microbenchmark Kemi Wang
2018-04-25  4:02 ` [PATCH v2 1/3] Tunables: Add tunables of spin count for pthread adaptive spin mutex Rical Jasan
2018-04-25  5:14   ` kemi
2018-05-02  1:54 ` kemi
2018-05-02  8:04 ` Florian Weimer
2018-05-02 11:08   ` kemi
2018-05-08 15:44     ` Florian Weimer
2018-05-14  4:06       ` kemi [this message]
2018-05-14  5:05         ` kemi
2018-05-14  7:30         ` Florian Weimer
2018-05-14  7:39           ` kemi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8dbdd127-01ad-5341-1824-52bd18f1d183@intel.com \
    --to=kemi.wang@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=andi.kleen@intel.com \
    --cc=aubrey.li@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).