Re: [PATCH] NUMA spinlock [BZ #23962]

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: kemi <kemi.wang@intel.com>
To: Carlos O'Donell <carlos@redhat.com>,
	Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
	Florian Weimer <fweimer@redhat.com>
Cc: nd <nd@arm.com>, "H.J. Lu" <hjl.tools@gmail.com>,
	libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
Date: Fri, 11 Jan 2019 12:01:00 -0000	[thread overview]
Message-ID: <3ab17be2-5dd3-bcca-14ed-614bd6404670@intel.com> (raw)
In-Reply-To: <999d5a20-288d-ab46-954f-b13e331ca317@redhat.com>

On 2019/1/11 ä¸Šåˆ3:24, Carlos O'Donell wrote:
> On 1/10/19 12:52 PM, Szabolcs Nagy wrote:
>> On 10/01/2019 16:41, Carlos O'Donell wrote:
>>> On 1/10/19 11:32 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>
>>>>> My opinion is that for the health and evolution of a NUMA-aware spinlock
>>>>> and MCS lock, that we should create a distinct project and library that
>>>>> should have those locks, and then work to put them into downstream
>>>>> distributions. This will support key users being able to use supported
>>>>> versions of those libraries, and give the needed feedback about the API
>>>>> and the performance. It may take 1-2 years to get that feedback and every
>>>>> piece of feedback will improve the final API/ABI we put into glibc or
>>>>> even into the next ISO C standard as pat of the C thread interface.
>>>>
>>>> I think it's something taht could land in tbb, for which many
>>>> distributions already have mechanisms to ship updated versions after a
>>>> release.
>>>
>>> Absolutely. That's a great idea.
>>>
>>
>> in principle the pthread_spin_lock api can use this algorithm
>> assuming we can keep the pthread_spinlock_t abi and keep the
>> POSIX semantics. (presumably users ran into issues with the
>> existing posix api.. or how did this come up in the first place?)
>  
> Correct, but meeting the ABI contract of the pthread_spinlck_t turns
> out to be hard, there isn't much space. I've spoken with Kemi Wang 
> (Intel) about this specific issue, and he has some ideas to share,
> but I'll leave it for him to describe.
> 

It may be possible because we can make better use of size of pthread_spinlock_t.

MCS lock is a well known method to reduce spinlock overhead by queuing spinner, the spinlock 
cache line is only contended between spinlock holder and a active spinner, other spinners are
spinning on local-accessible flag until the previous spinner pass mcs lock holder down.

Usually, a classical MCS implementation requires an extra pointer *mcs_lock* to track the tail of queue.
When a new spinner is adding into the queue, we first get the current tail of queue, and move the mcs_lock
pointer to point to this new spinner(a new tail of queue). 
If we can squeeze some space in pthread_spinlock_t to store this tail info, and update this tail info
when a new spinner is added into the queue, then the MCS algorithm can be reimplemented without breaking ABI.
That's possible because *lock* itself don't have to occupy 32 bits (8 bits or even one bit should be enough).

Then the pthread_spinlock_t structure may be like this(Similar to qspinlock in kernel):
struct pthread_spinlock_t
{
   union {
      struct {
         u8 locked; // lock byte
         u8 reserve; 
         u16 cpuid; // CPU id used by the last spinner, and using per-cpu infrastructure to convert it
         a pointer which points to the tail of queue. E.g per_cpu_var(qnode, cpuid)
      }
   int lock;
   }
}

PER-CPU struct qnode {
    struct qnode *next; // point to next spinner
    int flag;  // local spinning flag
}

But they are two problems here.
a) Lack of per-cpu infrastructure support in Glibc, so we can't do this cpuid->per-cpu-variable transition
b) Can't disable preemption at userland. 
   When a new spinner is adding to the queue, we need update the cpuid of pthread_spinlock_t to a new one.
   Pseudo-code:
	newid = get_current_cpuid();  
        prev = atomic_exchange_acquire(&cpuid, newid); // update cpuid to the new cpuid, and return
							 back the previous one
        tail_node = per_cpu_var(qnode, prev);  //get the last tail node of queue

There is a problem when preemption happens at a time window between get_current_cpuid() and atomic_exchange_acquire().
When the thread is rescheduled back, it maybe on another cpu with different cpuid.

===============================CUT HERE==================================
Another way is to store thread-specific info(e.g. tid) in pthread_spinlock_t instead of cpuid, then, we can avoid the
issue b), but it seems that we break the semantic of TLS? Comments?

next prev parent reply	other threads:[~2019-01-11 12:01 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-26  9:51 Ma Ling
2019-01-03  4:05 ` 马凌(彦军)
     [not found]   ` <0a474516-b8c8-48cf-aeea-e57c77b78cbd.ling.ml@antfin.com>
2019-01-03  5:35     ` 转发：[PATCH] " 马凌(彦军)
2019-01-03 14:52       ` Szabolcs Nagy
2019-01-03 19:59         ` H.J. Lu
2019-01-05 12:34           ` [PATCH] " Carlos O'Donell
2019-01-05 16:36             ` H.J. Lu
2019-01-07 19:12               ` Florian Weimer
2019-01-07 19:49                 ` H.J. Lu
2019-01-10 16:31                   ` Carlos O'Donell
2019-01-10 16:32                     ` Florian Weimer
2019-01-10 16:41                       ` Carlos O'Donell
2019-01-10 17:52                         ` Szabolcs Nagy
2019-01-10 19:24                           ` Carlos O'Donell
2019-01-11 12:01                             ` kemi [this message]
2019-01-14 22:45                         ` Torvald Riegel
2019-01-15  9:32                           ` Florian Weimer
2019-01-15 12:01                             ` Torvald Riegel
2019-01-15 12:17                               ` Florian Weimer
2019-01-15 12:31                                 ` Torvald Riegel
2019-01-11 16:24                       ` H.J. Lu
2019-01-14 23:03             ` Torvald Riegel
2019-01-04  4:13         ` 转发：[PATCH] " 马凌(彦军)
2019-01-03 20:43 ` [PATCH] " Rich Felker
2019-01-03 20:55   ` H.J. Lu
2019-01-03 21:21     ` Rich Felker
2019-01-03 21:28       ` H.J. Lu
2019-01-14 23:18       ` Torvald Riegel
2019-01-15  2:33         ` kemi
2019-01-15 12:37           ` Torvald Riegel
2019-01-15 16:44             ` Rich Felker
2019-01-17  3:10             ` kemi
2019-02-04 17:23               ` Torvald Riegel
2019-01-14 22:40     ` Torvald Riegel
2019-01-14 23:26 ` Torvald Riegel
2019-01-15  4:47   ` 马凌(彦军)
2019-01-15  2:56 ` kemi
2019-01-15  4:27   ` 马凌(彦军)
2019-01-10 13:18 马凌(彦军)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ab17be2-5dd3-bcca-14ed-614bd6404670@intel.com \
    --to=kemi.wang@intel.com \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).