From: kemi <kemi.wang@intel.com>
To: Carlos O'Donell <carlos@redhat.com>,
Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
Florian Weimer <fweimer@redhat.com>
Cc: nd <nd@arm.com>, "H.J. Lu" <hjl.tools@gmail.com>,
libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
Date: Fri, 11 Jan 2019 12:01:00 -0000 [thread overview]
Message-ID: <3ab17be2-5dd3-bcca-14ed-614bd6404670@intel.com> (raw)
In-Reply-To: <999d5a20-288d-ab46-954f-b13e331ca317@redhat.com>
On 2019/1/11 ä¸å3:24, Carlos O'Donell wrote:
> On 1/10/19 12:52 PM, Szabolcs Nagy wrote:
>> On 10/01/2019 16:41, Carlos O'Donell wrote:
>>> On 1/10/19 11:32 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>
>>>>> My opinion is that for the health and evolution of a NUMA-aware spinlock
>>>>> and MCS lock, that we should create a distinct project and library that
>>>>> should have those locks, and then work to put them into downstream
>>>>> distributions. This will support key users being able to use supported
>>>>> versions of those libraries, and give the needed feedback about the API
>>>>> and the performance. It may take 1-2 years to get that feedback and every
>>>>> piece of feedback will improve the final API/ABI we put into glibc or
>>>>> even into the next ISO C standard as pat of the C thread interface.
>>>>
>>>> I think it's something taht could land in tbb, for which many
>>>> distributions already have mechanisms to ship updated versions after a
>>>> release.
>>>
>>> Absolutely. That's a great idea.
>>>
>>
>> in principle the pthread_spin_lock api can use this algorithm
>> assuming we can keep the pthread_spinlock_t abi and keep the
>> POSIX semantics. (presumably users ran into issues with the
>> existing posix api.. or how did this come up in the first place?)
>
> Correct, but meeting the ABI contract of the pthread_spinlck_t turns
> out to be hard, there isn't much space. I've spoken with Kemi Wang
> (Intel) about this specific issue, and he has some ideas to share,
> but I'll leave it for him to describe.
>
It may be possible because we can make better use of size of pthread_spinlock_t.
MCS lock is a well known method to reduce spinlock overhead by queuing spinner, the spinlock
cache line is only contended between spinlock holder and a active spinner, other spinners are
spinning on local-accessible flag until the previous spinner pass mcs lock holder down.
Usually, a classical MCS implementation requires an extra pointer *mcs_lock* to track the tail of queue.
When a new spinner is adding into the queue, we first get the current tail of queue, and move the mcs_lock
pointer to point to this new spinner(a new tail of queue).
If we can squeeze some space in pthread_spinlock_t to store this tail info, and update this tail info
when a new spinner is added into the queue, then the MCS algorithm can be reimplemented without breaking ABI.
That's possible because *lock* itself don't have to occupy 32 bits (8 bits or even one bit should be enough).
Then the pthread_spinlock_t structure may be like this(Similar to qspinlock in kernel):
struct pthread_spinlock_t
{
union {
struct {
u8 locked; // lock byte
u8 reserve;
u16 cpuid; // CPU id used by the last spinner, and using per-cpu infrastructure to convert it
a pointer which points to the tail of queue. E.g per_cpu_var(qnode, cpuid)
}
int lock;
}
}
PER-CPU struct qnode {
struct qnode *next; // point to next spinner
int flag; // local spinning flag
}
But they are two problems here.
a) Lack of per-cpu infrastructure support in Glibc, so we can't do this cpuid->per-cpu-variable transition
b) Can't disable preemption at userland.
When a new spinner is adding to the queue, we need update the cpuid of pthread_spinlock_t to a new one.
Pseudo-code:
newid = get_current_cpuid();
prev = atomic_exchange_acquire(&cpuid, newid); // update cpuid to the new cpuid, and return
back the previous one
tail_node = per_cpu_var(qnode, prev); //get the last tail node of queue
There is a problem when preemption happens at a time window between get_current_cpuid() and atomic_exchange_acquire().
When the thread is rescheduled back, it maybe on another cpu with different cpuid.
===============================CUT HERE==================================
Another way is to store thread-specific info(e.g. tid) in pthread_spinlock_t instead of cpuid, then, we can avoid the
issue b), but it seems that we break the semantic of TLS? Comments?
next prev parent reply other threads:[~2019-01-11 12:01 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-26 9:51 Ma Ling
2019-01-03 4:05 ` 马凌(彦军)
[not found] ` <0a474516-b8c8-48cf-aeea-e57c77b78cbd.ling.ml@antfin.com>
2019-01-03 5:35 ` 转发:[PATCH] " 马凌(彦军)
2019-01-03 14:52 ` Szabolcs Nagy
2019-01-03 19:59 ` H.J. Lu
2019-01-05 12:34 ` [PATCH] " Carlos O'Donell
2019-01-05 16:36 ` H.J. Lu
2019-01-07 19:12 ` Florian Weimer
2019-01-07 19:49 ` H.J. Lu
2019-01-10 16:31 ` Carlos O'Donell
2019-01-10 16:32 ` Florian Weimer
2019-01-10 16:41 ` Carlos O'Donell
2019-01-10 17:52 ` Szabolcs Nagy
2019-01-10 19:24 ` Carlos O'Donell
2019-01-11 12:01 ` kemi [this message]
2019-01-14 22:45 ` Torvald Riegel
2019-01-15 9:32 ` Florian Weimer
2019-01-15 12:01 ` Torvald Riegel
2019-01-15 12:17 ` Florian Weimer
2019-01-15 12:31 ` Torvald Riegel
2019-01-11 16:24 ` H.J. Lu
2019-01-14 23:03 ` Torvald Riegel
2019-01-04 4:13 ` 转发:[PATCH] " 马凌(彦军)
2019-01-03 20:43 ` [PATCH] " Rich Felker
2019-01-03 20:55 ` H.J. Lu
2019-01-03 21:21 ` Rich Felker
2019-01-03 21:28 ` H.J. Lu
2019-01-14 23:18 ` Torvald Riegel
2019-01-15 2:33 ` kemi
2019-01-15 12:37 ` Torvald Riegel
2019-01-15 16:44 ` Rich Felker
2019-01-17 3:10 ` kemi
2019-02-04 17:23 ` Torvald Riegel
2019-01-14 22:40 ` Torvald Riegel
2019-01-14 23:26 ` Torvald Riegel
2019-01-15 4:47 ` 马凌(彦军)
2019-01-15 2:56 ` kemi
2019-01-15 4:27 ` 马凌(彦军)
2019-01-10 13:18 马凌(彦军)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3ab17be2-5dd3-bcca-14ed-614bd6404670@intel.com \
--to=kemi.wang@intel.com \
--cc=Szabolcs.Nagy@arm.com \
--cc=carlos@redhat.com \
--cc=fweimer@redhat.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).