public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* pthread condvars and priority inheritance
@ 2017-09-26  0:05 Rodriguez, Sergio SF
  2017-09-26  4:21 ` Carlos O'Donell
  0 siblings, 1 reply; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-09-26  0:05 UTC (permalink / raw)
  To: libc-alpha

Hi everyone,

I was looking at the bug 
https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and was wondering
if someone on the glibc community is looking at this, or if the bug is
being tracked somewhere else?

Thanks in advance
Sergio Rodriguez  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-09-26  0:05 pthread condvars and priority inheritance Rodriguez, Sergio SF
@ 2017-09-26  4:21 ` Carlos O'Donell
  2017-09-26 22:21   ` Rodriguez, Sergio SF
  0 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2017-09-26  4:21 UTC (permalink / raw)
  To: Rodriguez, Sergio SF, libc-alpha

On 09/25/2017 06:05 PM, Rodriguez, Sergio SF wrote:
> Hi everyone,
> 
> I was looking at the bug 
> https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and was wondering
> if someone on the glibc community is looking at this, or if the bug is
> being tracked somewhere else?

Nobody I know is working on this issue right now.

Torvald's improved condition variables was the most recent work.

Darren Hart's PI work for condvars has to be updated in some way
to the new implementation of condvars.

Bug 11588 is the right place to track status.

I recommend having detailed discussions on libc-alpha, and only
posting status updates or consensus conclusions in the bug, that
way it's easier to follow the status.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-09-26  4:21 ` Carlos O'Donell
@ 2017-09-26 22:21   ` Rodriguez, Sergio SF
  2017-10-04 12:09     ` Torvald Riegel
  0 siblings, 1 reply; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-09-26 22:21 UTC (permalink / raw)
  To: libc-alpha

On Mon, 2017-09-25 at 22:21 -0600, Carlos O'Donell wrote:
> On 09/25/2017 06:05 PM, Rodriguez, Sergio SF wrote:
> > 
> > Hi everyone,
> > 
> > I was looking at the bug 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and was
> > wondering
> > if someone on the glibc community is looking at this, or if the bug
> > is
> > being tracked somewhere else?
> Nobody I know is working on this issue right now.
> 
> Torvald's improved condition variables was the most recent work.
> 
I been looking at the newest condvar implementation
and in the bug thread
Torvald Riegel lists the main issues there, specially the one in the
fifth bullet where waiter priority needs to be boosted in order to
notify signaler that they have been woken up. Afaik, current kernel
futex operation does not have such functionality for waiter to boost
priority during futex wait period. Was there any thought or discussion
on enhancing the kernel futex?

https://sourceware.org/bugzilla/show_bug.cgi?id=11588#c53

> Darren Hart's PI work for condvars has to be updated in some way
> to the new implementation of condvars.
> 

Understood. The previous work from Darren is this right?

https://sourceware.org/bugzilla/attachment.cgi?id=7687

It was to fix a potential scenario that can hit an unbounded
priority inversion on an internal condvar lock used in
pthread_cond_wait() in previous condvar implementation. The new condvar
has an internal lock used in pthread_cond_signal() and that can have
similar issue. A similar work might be doable. It seems to me the only
constraint is the pthread_cond_t structure already ran out of space for
a 32-bit field for the internal lock for storing the TID for the
FUTEX_LOCK_PI operation which was used in Darrens's PI work. Is there
something I might be missing?

> Bug 11588 is the right place to track status.
> 
> I recommend having detailed discussions on libc-alpha, and only
> posting status updates or consensus conclusions in the bug, that
> way it's easier to follow the status.
> 

Thanks, I will 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-09-26 22:21   ` Rodriguez, Sergio SF
@ 2017-10-04 12:09     ` Torvald Riegel
  2017-10-06  0:26       ` Rodriguez, Sergio SF
  0 siblings, 1 reply; 11+ messages in thread
From: Torvald Riegel @ 2017-10-04 12:09 UTC (permalink / raw)
  To: Rodriguez, Sergio SF; +Cc: libc-alpha

On Tue, 2017-09-26 at 22:21 +0000, Rodriguez, Sergio SF wrote:
> On Mon, 2017-09-25 at 22:21 -0600, Carlos O'Donell wrote:
> > On 09/25/2017 06:05 PM, Rodriguez, Sergio SF wrote:
> 
> > > 
> 
> > > Hi everyone,
> 
> > > 
> 
> > > I was looking at the bug 
> 
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and was
> 
> > > wondering
> 
> > > if someone on the glibc community is looking at this, or if the bug
> 
> > > is
> 
> > > being tracked somewhere else?
> 
> > Nobody I know is working on this issue right now.
> 
> > 
> 
> > Torvald's improved condition variables was the most recent work.
> 
> > 
> 
> I been looking at the newest condvar implementation
> and in the bug thread
> Torvald Riegel lists the main issues there, specially the one in the
> fifth bullet where waiter priority needs to be boosted in order to
> notify signaler that they have been woken up. Afaik, current kernel
> futex operation does not have such functionality for waiter to boost
> priority during futex wait period. Was there any thought or discussion
> on enhancing the kernel futex?
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=11588#c53

Latest status AFAIK is what got discussed at last year's real-time
summit:
https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedule
https://www.youtube.com/playlist?list=PLbzoR-pLrL6oauKOYMdNjN9RuTSmC6nfR

Thomas Gleixner, Peter Zijlstra, and I were talking about meeting and
brain-storming about revising futexes, but I didn't end up having any
time for that so far.  I believe this needs new futex
operations/functionality, given all the POSIX and RT constraints we
have.

> Understood. The previous work from Darren is this right?
> 
> https://sourceware.org/bugzilla/attachment.cgi?id=7687
> 
> It was to fix a potential scenario that can hit an unbounded
> priority inversion on an internal condvar lock used in
> pthread_cond_wait() in previous condvar implementation. The new condvar
> has an internal lock used in pthread_cond_signal() and that can have
> similar issue. A similar work might be doable. It seems to me the only
> constraint is the pthread_cond_t structure already ran out of space for
> a 32-bit field for the internal lock for storing the TID for the
> FUTEX_LOCK_PI operation which was used in Darrens's PI work. Is there
> something I might be missing?

The ABA on 32b futexes is one issue; if we had a 64b futex, ideally one
that can wake up more selectively, we could avoid the need to do
blocking.  I explained the trade-offs in more detail in my RT Summit
talk.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-04 12:09     ` Torvald Riegel
@ 2017-10-06  0:26       ` Rodriguez, Sergio SF
  2017-10-06  9:42         ` Torvald Riegel
  0 siblings, 1 reply; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-10-06  0:26 UTC (permalink / raw)
  To: triegel; +Cc: libc-alpha

On Wed, 2017-10-04 at 14:09 +0200, Torvald Riegel wrote:
> On Tue, 2017-09-26 at 22:21 +0000, Rodriguez, Sergio SF wrote:
> > 
> > On Mon, 2017-09-25 at 22:21 -0600, Carlos O'Donell wrote:
> > > 
> > > On 09/25/2017 06:05 PM, Rodriguez, Sergio SF wrote:
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > Hi everyone,
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > I was looking at the bug 
> > > 
> > > > 
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and was
> > > 
> > > > 
> > > > wondering
> > > 
> > > > 
> > > > if someone on the glibc community is looking at this, or if the
> > > > bug
> > > 
> > > > 
> > > > is
> > > 
> > > > 
> > > > being tracked somewhere else?
> > > 
> > > Nobody I know is working on this issue right now.
> > > 
> > > 
> > > 
> > > Torvald's improved condition variables was the most recent work.
> > > 
> > > 
> > I been looking at the newest condvar implementation
> > and in the bug thread
> > Torvald Riegel lists the main issues there, specially the one in
> > the
> > fifth bullet where waiter priority needs to be boosted in order to
> > notify signaler that they have been woken up. Afaik, current kernel
> > futex operation does not have such functionality for waiter to
> > boost
> > priority during futex wait period. Was there any thought or
> > discussion
> > on enhancing the kernel futex?
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=11588#c53
> Latest status AFAIK is what got discussed at last year's real-time
> summit:
> https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedu
> le
> https://www.youtube.com/playlist?list=PLbzoR-pLrL6oauKOYMdNjN9RuTSmC6
> nfR
> 
> Thomas Gleixner, Peter Zijlstra, and I were talking about meeting and
> brain-storming about revising futexes, but I didn't end up having any
> time for that so far.  I believe this needs new futex
> operations/functionality, given all the POSIX and RT constraints we
> have.
> 
> > 
> > Understood. The previous work from Darren is this right?
> > 
> > https://sourceware.org/bugzilla/attachment.cgi?id=7687
> > 
> > It was to fix a potential scenario that can hit an unbounded
> > priority inversion on an internal condvar lock used in
> > pthread_cond_wait() in previous condvar implementation. The new
> > condvar
> > has an internal lock used in pthread_cond_signal() and that can
> > have
> > similar issue. A similar work might be doable. It seems to me the
> > only
> > constraint is the pthread_cond_t structure already ran out of space
> > for
> > a 32-bit field for the internal lock for storing the TID for the
> > FUTEX_LOCK_PI operation which was used in Darrens's PI work. Is
> > there
> > something I might be missing?
> The ABA on 32b futexes is one issue; if we had a 64b futex, ideally
> one
> that can wake up more selectively, we could avoid the need to do
> blocking.  I explained the trade-offs in more detail in my RT Summit
> talk.
> 

Okay. Just make sure I understood correctly. I am assuming if we had
64b futex, then we would not need waiter groups as we have currently as
you could embed some info in the extra space in the 64b futex and use
that info to wake up waiters in more ordered sequence. As a result,
quiescent and locking would not be needed. Is that right? 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-06  0:26       ` Rodriguez, Sergio SF
@ 2017-10-06  9:42         ` Torvald Riegel
  2017-10-10  0:22           ` Rodriguez, Sergio SF
  0 siblings, 1 reply; 11+ messages in thread
From: Torvald Riegel @ 2017-10-06  9:42 UTC (permalink / raw)
  To: Rodriguez, Sergio SF; +Cc: libc-alpha

On Fri, 2017-10-06 at 00:26 +0000, Rodriguez, Sergio SF wrote:
> On Wed, 2017-10-04 at 14:09 +0200, Torvald Riegel wrote:
> > On Tue, 2017-09-26 at 22:21 +0000, Rodriguez, Sergio SF wrote:
> 
> > > 
> 
> > > On Mon, 2017-09-25 at 22:21 -0600, Carlos O'Donell wrote:
> 
> > > > 
> 
> > > > On 09/25/2017 06:05 PM, Rodriguez, Sergio SF wrote:
> 
> > > > 
> 
> > > > > 
> 
> > > > > 
> 
> > > > 
> 
> > > > > 
> 
> > > > > Hi everyone,
> 
> > > > 
> 
> > > > > 
> 
> > > > > 
> 
> > > > 
> 
> > > > > 
> 
> > > > > I was looking at the bug 
> 
> > > > 
> 
> > > > > 
> 
> > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and was
> 
> > > > 
> 
> > > > > 
> 
> > > > > wondering
> 
> > > > 
> 
> > > > > 
> 
> > > > > if someone on the glibc community is looking at this, or if the
> 
> > > > > bug
> 
> > > > 
> 
> > > > > 
> 
> > > > > is
> 
> > > > 
> 
> > > > > 
> 
> > > > > being tracked somewhere else?
> 
> > > > 
> 
> > > > Nobody I know is working on this issue right now.
> 
> > > > 
> 
> > > > 
> 
> > > > 
> 
> > > > Torvald's improved condition variables was the most recent work.
> 
> > > > 
> 
> > > > 
> 
> > > I been looking at the newest condvar implementation
> 
> > > and in the bug thread
> 
> > > Torvald Riegel lists the main issues there, specially the one in
> 
> > > the
> 
> > > fifth bullet where waiter priority needs to be boosted in order to
> 
> > > notify signaler that they have been woken up. Afaik, current kernel
> 
> > > futex operation does not have such functionality for waiter to
> 
> > > boost
> 
> > > priority during futex wait period. Was there any thought or
> 
> > > discussion
> 
> > > on enhancing the kernel futex?
> 
> > > 
> 
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588#c53
> 
> > Latest status AFAIK is what got discussed at last year's real-time
> 
> > summit:
> 
> > https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedu
> 
> > le
> 
> > https://www.youtube.com/playlist?list=PLbzoR-pLrL6oauKOYMdNjN9RuTSmC6
> 
> > nfR
> 
> > 
> 
> > Thomas Gleixner, Peter Zijlstra, and I were talking about meeting and
> 
> > brain-storming about revising futexes, but I didn't end up having any
> 
> > time for that so far.  I believe this needs new futex
> 
> > operations/functionality, given all the POSIX and RT constraints we
> 
> > have.
> 
> > 
> 
> > > 
> 
> > > Understood. The previous work from Darren is this right?
> 
> > > 
> 
> > > https://sourceware.org/bugzilla/attachment.cgi?id=7687
> 
> > > 
> 
> > > It was to fix a potential scenario that can hit an unbounded
> 
> > > priority inversion on an internal condvar lock used in
> 
> > > pthread_cond_wait() in previous condvar implementation. The new
> 
> > > condvar
> 
> > > has an internal lock used in pthread_cond_signal() and that can
> 
> > > have
> 
> > > similar issue. A similar work might be doable. It seems to me the
> 
> > > only
> 
> > > constraint is the pthread_cond_t structure already ran out of space
> 
> > > for
> 
> > > a 32-bit field for the internal lock for storing the TID for the
> 
> > > FUTEX_LOCK_PI operation which was used in Darrens's PI work. Is
> 
> > > there
> 
> > > something I might be missing?
> 
> > The ABA on 32b futexes is one issue; if we had a 64b futex, ideally
> 
> > one
> 
> > that can wake up more selectively, we could avoid the need to do
> 
> > blocking.  I explained the trade-offs in more detail in my RT Summit
> 
> > talk.
> 
> > 
> 
> 
> Okay. Just make sure I understood correctly. I am assuming if we had
> 64b futex, then we would not need waiter groups as we have currently as
> you could embed some info in the extra space in the 64b futex and use
> that info to wake up waiters in more ordered sequence. As a result,
> quiescent and locking would not be needed. Is that right? 

That's not quite the full story.  The 64b futex would avoid the need for
ABA, and thus quiescence.  The waiter groups would still be necessary to
get the normal ordering requirements satisfied (ie, which waiter is
eligible to consume a signal); the groups could potentially be avoided
if the futex would provide stronger (and suitable) guarantees regarding
which threads are woken up. 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-06  9:42         ` Torvald Riegel
@ 2017-10-10  0:22           ` Rodriguez, Sergio SF
  2017-10-10 13:41             ` Torvald Riegel
  0 siblings, 1 reply; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-10-10  0:22 UTC (permalink / raw)
  To: triegel; +Cc: libc-alpha

On Fri, 2017-10-06 at 11:42 +0200, Torvald Riegel wrote:
> On Fri, 2017-10-06 at 00:26 +0000, Rodriguez, Sergio SF wrote:
> > 
> > On Wed, 2017-10-04 at 14:09 +0200, Torvald Riegel wrote:
> > > 
> > > On Tue, 2017-09-26 at 22:21 +0000, Rodriguez, Sergio SF wrote:
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > On Mon, 2017-09-25 at 22:21 -0600, Carlos O'Donell wrote:
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > On 09/25/2017 06:05 PM, Rodriguez, Sergio SF wrote:
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > Hi everyone,
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > I was looking at the bug 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 and
> > > > > > was
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > wondering
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > if someone on the glibc community is looking at this, or if
> > > > > > the
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > bug
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > is
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > being tracked somewhere else?
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > Nobody I know is working on this issue right now.
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > Torvald's improved condition variables was the most recent
> > > > > work.
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> > > > 
> > > > I been looking at the newest condvar implementation
> > > 
> > > > 
> > > > and in the bug thread
> > > 
> > > > 
> > > > Torvald Riegel lists the main issues there, specially the one
> > > > in
> > > 
> > > > 
> > > > the
> > > 
> > > > 
> > > > fifth bullet where waiter priority needs to be boosted in order
> > > > to
> > > 
> > > > 
> > > > notify signaler that they have been woken up. Afaik, current
> > > > kernel
> > > 
> > > > 
> > > > futex operation does not have such functionality for waiter to
> > > 
> > > > 
> > > > boost
> > > 
> > > > 
> > > > priority during futex wait period. Was there any thought or
> > > 
> > > > 
> > > > discussion
> > > 
> > > > 
> > > > on enhancing the kernel futex?
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588#c53
> > > 
> > > Latest status AFAIK is what got discussed at last year's real-
> > > time
> > > 
> > > summit:
> > > 
> > > https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/sc
> > > hedu
> > > 
> > > le
> > > 
> > > https://www.youtube.com/playlist?list=PLbzoR-pLrL6oauKOYMdNjN9RuT
> > > SmC6
> > > 
> > > nfR
> > > 
> > > 
> > > 
> > > Thomas Gleixner, Peter Zijlstra, and I were talking about meeting
> > > and
> > > 
> > > brain-storming about revising futexes, but I didn't end up having
> > > any
> > > 
> > > time for that so far.  I believe this needs new futex
> > > 
> > > operations/functionality, given all the POSIX and RT constraints
> > > we
> > > 
> > > have.
> > > 
> > > 
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > Understood. The previous work from Darren is this right?
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > https://sourceware.org/bugzilla/attachment.cgi?id=7687
> > > 
> > > > 
> > > > 
> > > 
> > > > 
> > > > It was to fix a potential scenario that can hit an unbounded
> > > 
> > > > 
> > > > priority inversion on an internal condvar lock used in
> > > 
> > > > 
> > > > pthread_cond_wait() in previous condvar implementation. The new
> > > 
> > > > 
> > > > condvar
> > > 
> > > > 
> > > > has an internal lock used in pthread_cond_signal() and that can
> > > 
> > > > 
> > > > have
> > > 
> > > > 
> > > > similar issue. A similar work might be doable. It seems to me
> > > > the
> > > 
> > > > 
> > > > only
> > > 
> > > > 
> > > > constraint is the pthread_cond_t structure already ran out of
> > > > space
> > > 
> > > > 
> > > > for
> > > 
> > > > 
> > > > a 32-bit field for the internal lock for storing the TID for
> > > > the
> > > 
> > > > 
> > > > FUTEX_LOCK_PI operation which was used in Darrens's PI work. Is
> > > 
> > > > 
> > > > there
> > > 
> > > > 
> > > > something I might be missing?
> > > 
> > > The ABA on 32b futexes is one issue; if we had a 64b futex,
> > > ideally
> > > 
> > > one
> > > 
> > > that can wake up more selectively, we could avoid the need to do
> > > 
> > > blocking.  I explained the trade-offs in more detail in my RT
> > > Summit
> > > 
> > > talk.
> > > 
> > > 
> > 
> > Okay. Just make sure I understood correctly. I am assuming if we
> > had
> > 64b futex, then we would not need waiter groups as we have
> > currently as
> > you could embed some info in the extra space in the 64b futex and
> > use
> > that info to wake up waiters in more ordered sequence. As a result,
> > quiescent and locking would not be needed. Is that right? 
> That's not quite the full story.  The 64b futex would avoid the need
> for
> ABA, and thus quiescence.  The waiter groups would still be necessary
> to
> get the normal ordering requirements satisfied (ie, which waiter is
> eligible to consume a signal); the groups could potentially be
> avoided
> if the futex would provide stronger (and suitable) guarantees
> regarding
> which threads are woken up. 

Thanks, that clarified some doubts I had. In your presentation where
you mentioned versioning futex word in the 64b futex. Would that
versioning info be set and used solely by user space and opaque to
kernel? I am trying to understand more about it and you can correct me
I am assuming that how the version info would be used is, that the
waiter would check the version information from the futex word and
determine if the signal was intended for it to consume. If not, it
would send an artificial one. Is my understanding correct?

I found this previous conversion on 64b futex on lkml and am in process
understanding it.  

https://groups.google.com/forum/#!topic/linux.kernel/kdZwuuTIL20


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-10  0:22           ` Rodriguez, Sergio SF
@ 2017-10-10 13:41             ` Torvald Riegel
  2017-10-11 16:16               ` Rodriguez, Sergio SF
  0 siblings, 1 reply; 11+ messages in thread
From: Torvald Riegel @ 2017-10-10 13:41 UTC (permalink / raw)
  To: Rodriguez, Sergio SF; +Cc: libc-alpha, Thomas Gleixner

On Tue, 2017-10-10 at 00:22 +0000, Rodriguez, Sergio SF wrote:
> Thanks, that clarified some doubts I had. In your presentation where
> you mentioned versioning futex word in the 64b futex. Would that
> versioning info be set and used solely by user space and opaque to
> kernel?

The kernel would use a 64b value as futex word, and userspace would add
some kind of "versioning" so that userspace can avoid ABA situations.

One could also argue that reaching a 32b overflow and thus ABA would be
rare, but that never seemed unlikely enough for me to just ignore it.
With 64b, we definitely can ignore it.
Nonetheless, one could also argue that having to do quiescence is a rare
event (32b overflow), and so a (soft) realtime client could perhaps just
deal with a rare chance of missing deadlines.  The critical scenario is
maybe more likely than actually hitting the ABA.

> I am trying to understand more about it and you can correct me
> I am assuming that how the version info would be used is, that the
> waiter would check the version information from the futex word and
> determine if the signal was intended for it to consume.

Yes, something like that.

> If not, it
> would send an artificial one. Is my understanding correct?
> 
> I found this previous conversion on 64b futex on lkml and am in process
> understanding it.  
> 
> https://groups.google.com/forum/#!topic/linux.kernel/kdZwuuTIL20
> 

I've read that conversation in the past, and have seen Linus' comment
why he thinks it's not necessary.  He seems to think only about
synchronization needs that are similar to mutual exclusion, and
essentially boil down to a simple boolean flag: is the lock acquire, or
isn't it?
But condvars are not like that, as they have to represent an ordering
(ie, only some waiters are eligible to consume signals, so there is more
information that must not get lost).

64b futexes could also help in cases where userspace wants to
synchronize through futex words that are full pointers on 64b systems.

If I had to design new futex operations, I'd probably start at 64b (or
at least machine word size on the respective arch, so at least
pointer-type-sized) and think about adding operations that (1) can cover
a wider range of conditions (eg, less-than vs. just equality of futex
word to expected value) and (2) perhaps operations that atomically
update the futex word (though I'd have to page in my past thoughts on
that to remember what precisely I saw a need for, and why).

Have you talked to Thomas Gleixner yet?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-10 13:41             ` Torvald Riegel
@ 2017-10-11 16:16               ` Rodriguez, Sergio SF
  2017-10-25 23:35                 ` Rodriguez, Sergio SF
  0 siblings, 1 reply; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-10-11 16:16 UTC (permalink / raw)
  To: triegel; +Cc: tglx, libc-alpha

On Tue, 2017-10-10 at 15:41 +0200, Torvald Riegel wrote:
> On Tue, 2017-10-10 at 00:22 +0000, Rodriguez, Sergio SF wrote:
> > 
> > Thanks, that clarified some doubts I had. In your presentation
> > where
> > you mentioned versioning futex word in the 64b futex. Would that
> > versioning info be set and used solely by user space and opaque to
> > kernel?
> The kernel would use a 64b value as futex word, and userspace would
> add
> some kind of "versioning" so that userspace can avoid ABA situations.

Okay. I was thinking the version info can be outside of the 32b futex
word handled by the userspace, if not used by kernel.

> One could also argue that reaching a 32b overflow and thus ABA would
> be
> rare, but that never seemed unlikely enough for me to just ignore it.
> With 64b, we definitely can ignore it.

Understood. The 32b overflow on signal count is a concern when we don’t
do quiescence for group switch.

> Nonetheless, one could also argue that having to do quiescence is a
> rare
> event (32b overflow), and so a (soft) realtime client could perhaps
> just
> deal with a rare chance of missing deadlines.  The critical scenario
> is
> maybe more likely than actually hitting the ABA.
> 

Maybe this can be a 2-step approach, first remove the quiescence with
version info outside of the 32b futex just for specific real-time
application to avoid priority inversion with this trade-off or
limitation on the signal count in 32b futex. 64b futex which requires
more discussion can be the next step to be added to address the 32b
overflow.

> > 
> > I am trying to understand more about it and you can correct me
> > I am assuming that how the version info would be used is, that the
> > waiter would check the version information from the futex word and
> > determine if the signal was intended for it to consume.
> Yes, something like that.
> 
> > 
> > If not, it
> > would send an artificial one. Is my understanding correct?
> > 
> > I found this previous conversion on 64b futex on lkml and am in
> > process
> > understanding it.  
> > 
> > https://groups.google.com/forum/#!topic/linux.kernel/kdZwuuTIL20
> > 
> I've read that conversation in the past, and have seen Linus' comment
> why he thinks it's not necessary.  He seems to think only about
> synchronization needs that are similar to mutual exclusion, and
> essentially boil down to a simple boolean flag: is the lock acquire,
> or
> isn't it?
> But condvars are not like that, as they have to represent an ordering
> (ie, only some waiters are eligible to consume signals, so there is
> more
> information that must not get lost).
> 
> 64b futexes could also help in cases where userspace wants to
> synchronize through futex words that are full pointers on 64b
> systems.
> 
> If I had to design new futex operations, I'd probably start at 64b
> (or
> at least machine word size on the respective arch, so at least
> pointer-type-sized) and think about adding operations that (1) can
> cover
> a wider range of conditions (eg, less-than vs. just equality of futex
> word to expected value) and (2) perhaps operations that atomically
> update the futex word (though I'd have to page in my past thoughts on
> that to remember what precisely I saw a need for, and why).
> 
> Have you talked to Thomas Gleixner yet?
> 
No. I have only started the conversion on this thread. The 64b futex
seems to be a good option to address the 32b futex ABA issue. You
brought up a few other potential usages of having 64b futex. If we can
get some comment and feedback from Thomas, that would be good.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-11 16:16               ` Rodriguez, Sergio SF
@ 2017-10-25 23:35                 ` Rodriguez, Sergio SF
  2017-12-09  0:36                   ` Rodriguez, Sergio SF
  0 siblings, 1 reply; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-10-25 23:35 UTC (permalink / raw)
  To: triegel; +Cc: tglx, libc-alpha

On Wed, 2017-10-11 at 09:20 -0700, Rodriguez, Sergio SF wrote:
> On Tue, 2017-10-10 at 15:41 +0200, Torvald Riegel wrote:
> > 
> > On Tue, 2017-10-10 at 00:22 +0000, Rodriguez, Sergio SF wrote:
> > > 
> > > 
> > > Thanks, that clarified some doubts I had. In your presentation
> > > where
> > > you mentioned versioning futex word in the 64b futex. Would that
> > > versioning info be set and used solely by user space and opaque
> > > to
> > > kernel?
> > The kernel would use a 64b value as futex word, and userspace would
> > add
> > some kind of "versioning" so that userspace can avoid ABA
> > situations.
> Okay. I was thinking the version info can be outside of the 32b futex
> word handled by the userspace, if not used by kernel.
> 
> > 
> > One could also argue that reaching a 32b overflow and thus ABA
> > would
> > be
> > rare, but that never seemed unlikely enough for me to just ignore
> > it.
> > With 64b, we definitely can ignore it.
> Understood. The 32b overflow on signal count is a concern when we
> don’t
> do quiescence for group switch.
> 
> > 
> > Nonetheless, one could also argue that having to do quiescence is a
> > rare
> > event (32b overflow), and so a (soft) realtime client could perhaps
> > just
> > deal with a rare chance of missing deadlines.  The critical
> > scenario
> > is
> > maybe more likely than actually hitting the ABA.
> > 
> Maybe this can be a 2-step approach, first remove the quiescence with
> version info outside of the 32b futex just for specific real-time
> application to avoid priority inversion with this trade-off or
> limitation on the signal count in 32b futex. 64b futex which requires
> more discussion can be the next step to be added to address the 32b
> overflow.
> 
I did some experiment removing quiescence as I wanted to see the
effect. I also created a test case that can generate priority
inversion. I wanted to see if the priority inversion would happen when
there is no quiescence. I see dis-ordered wake-up when there is no
quiescence. If I understood correctly, the purpose of quiescence is 1)
to make sure waiters in group 1 is empty before re-using the group
memory (by group switch), and 2) to maintain the ordering of group
wake-up. The stronger wake-up ordering implemented by current scheme is
achieved not just by group switch but also along with the quiescence.
We need some kind of information to tell there are still older waiters
that have not consumed the signals and that we should not send a signal
(as the signal would wake a waiter from more recent group before the
older waiters are woken). I have not found a way to achieve this
without quiescence.

Maybe we should explore using priority inheritance on top of current
scheme, instead removing the source of priority inversion as I had
attempted to do. I will look at that next.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthread condvars and priority inheritance
  2017-10-25 23:35                 ` Rodriguez, Sergio SF
@ 2017-12-09  0:36                   ` Rodriguez, Sergio SF
  0 siblings, 0 replies; 11+ messages in thread
From: Rodriguez, Sergio SF @ 2017-12-09  0:36 UTC (permalink / raw)
  To: triegel; +Cc: tglx, libc-alpha

> > Maybe this can be a 2-step approach, first remove the quiescence
> > with
> > version info outside of the 32b futex just for specific real-time
> > application to avoid priority inversion with this trade-off or
> > limitation on the signal count in 32b futex. 64b futex which
> > requires
> > more discussion can be the next step to be added to address the 32b
> > overflow.
> > 
> I did some experiment removing quiescence as I wanted to see the
> effect. I also created a test case that can generate priority
> inversion. I wanted to see if the priority inversion would happen
> when
> there is no quiescence. I see dis-ordered wake-up when there is no
> quiescence. If I understood correctly, the purpose of quiescence is
> 1)
> to make sure waiters in group 1 is empty before re-using the group
> memory (by group switch), and 2) to maintain the ordering of group
> wake-up. The stronger wake-up ordering implemented by current scheme
> is
> achieved not just by group switch but also along with the quiescence.
> We need some kind of information to tell there are still older
> waiters
> that have not consumed the signals and that we should not send a
> signal
> (as the signal would wake a waiter from more recent group before the
> older waiters are woken). I have not found a way to achieve this
> without quiescence.
> 
> Maybe we should explore using priority inheritance on top of current
> scheme, instead removing the source of priority inversion as I had
> attempted to do. I will look at that next.
> 

Hi Torvald,

Just to follow up the previous discussion. I am looking into an option
on adding PI so that we can boost waiter priority in order for the
waiter to notify signaler that they have been woken up so that the
issue of priority inversion caused by the group quiescence blocking can
be avoided. I’d like to get some feedback before I start putting effort
on implementing this proposal.

I intend to keep the group quiescence which is required by the scheme
for maintaining group wake-up which is the purpose of the current
scheme and also to avoid ABA issue. Also removing the quiescence would
cause disordered wake up as described in my previous experiment.

The proposal involves a need for a new futex operation and I plan to
also send a proposal of that to lkml if this looks promising.

For adding PI support, what we are looking is a way to boost the
priority of the waiter thread so that the waiter thread would not be
preempted and delayed to wake the signaler thread. The
pthread_cond_wait() does not hold any lock as the g_refs is just a
futex and not a lock. It seems what we are looking is a way to boost a
thread priority without the thread actually holding a lock and the
priority boosting needs to start from when we are going into the
quiescence blocking for until the last waiter is woken.

Consider a new futex operation which I named it FUTEX_WAIT_WAITERS_PI
for the discussion. Below is a rough high level description.

FUTEX_WAIT_WAITERS_PI

Arguments - uaddr, val

This futex operation would check if there is any pending waiters on the
futex uaddr based on the kernel maintained waiter tree (which would be
a priority-based rbtree). If there is no pending waiters, the operation
will return to user space. If there is still pending waiters, the
kernel will boost the priority of pending waiters to the priority of
the calling thread, if the calling thread has higher priority, before
putting the calling thread to sleep. When the last waiter is being
woken up, kernel would also set up wake queue to wake up the calling
thread of FUTEX_WAIT_WAITERS_PI.

Just to get some idea how it would be used in the pthread_cond_*:

diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
index ffbbde4106..21403a66df 100644
--- a/nptl/pthread_cond_common.c
+++ b/nptl/pthread_cond_common.c
@@ -409,7 +409,7 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t
*cond, uint64_t wseq,
          r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1);

          if ((r >> 1) > 0)
-           futex_wait_simple (cond->__data.__g_refs + g1, r, private);
+           {
+             unsigned int signals = atomic_load_acquire (cond-
>__data.__g_signals + g1);
+             futex_wait_waiters_pi (cond->__data.__g_signals + g1,
signals, private);
+           }

          /* Reload here so we eventually see the most recent value
even if we
             do not spin.   */
          r = atomic_load_relaxed (cond->__data.__g_refs + g1);
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 7812b94a3a..691ef5b6f9 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -148,17 +148,7 @@ __condvar_cancel_waiting (pthread_cond_t *cond,
uint64_t seq, unsigned int g,
 static void
 __condvar_dec_grefs (pthread_cond_t *cond, unsigned int g, int
private)
 {
-  /* Release MO to synchronize-with the acquire load in
-     __condvar_quiesce_and_switch_g1.  */
-  if (atomic_fetch_add_release (cond->__data.__g_refs + g, -2) == 3)
-    {
-      /* Clear the wake-up request flag before waking up.  We do not
need more
-        than relaxed MO and it doesn't matter if we apply this for an
aliased
-        group because we wake all futex waiters right after clearing
the
-        flag.  */
-      atomic_fetch_and_relaxed (cond->__data.__g_refs + g, ~(unsigned
int) 1);
-      futex_wake (cond->__data.__g_refs + g, INT_MAX, private);
-    }
+  atomic_fetch_add_release (cond->__data.__g_refs + g, -2);
 }

The FUTEX_WAIT_WAITERS_PI operation would do just what the g_refs futex
would have done, i.e. putting the signaler thread to sleep when there
is still pending G1 waiters not woken up but without having to use
another futex, g_refs. The priority boosting of the waiter threads
would be supported by the PI part of the futex operation (i.e. RT-mutex 
backend).

The other benefit would be that we may not need to track the pending
waiter counts as we are tracking in g_refs currently as that is
achieved by the proposed FUTEX operation. So as a result, we may be
able to remove the g_refs from the pthread_cond_t structure.
 
I am also looking into the other case of priority inversion caused by
the internal lock used in the pthread_cond_signal() and
pthread_cond_broadcast(). The internal lock is implemented using 2 bits
from the pthread_cond_t structure and does not support PI. One option I
am currently prototyping is to use the existing FUTEX_LOCK_PI and
FUTEX_UNLOCK_PI to address that. This will be similar to what Darren
had previously done for 2.20 glibc condvar to fix a PI gap caused by an
internal lock in that version of condvar.
 
The other case I also hope to address is the thundering herd issue in
current pthread_cond_broadcast(). I think we can change to use the
existing FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex
operations. I have this part of the implementation with some initial
tests done and am currently doing more stress test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-12-09  0:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-26  0:05 pthread condvars and priority inheritance Rodriguez, Sergio SF
2017-09-26  4:21 ` Carlos O'Donell
2017-09-26 22:21   ` Rodriguez, Sergio SF
2017-10-04 12:09     ` Torvald Riegel
2017-10-06  0:26       ` Rodriguez, Sergio SF
2017-10-06  9:42         ` Torvald Riegel
2017-10-10  0:22           ` Rodriguez, Sergio SF
2017-10-10 13:41             ` Torvald Riegel
2017-10-11 16:16               ` Rodriguez, Sergio SF
2017-10-25 23:35                 ` Rodriguez, Sergio SF
2017-12-09  0:36                   ` Rodriguez, Sergio SF

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).