public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Is this pthread_cond_wait() race possible
@ 2017-06-07 16:48 Sebastian Andrzej Siewior
  2017-06-07 19:41 ` Torvald Riegel
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-07 16:48 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

Hi Torvald,

I've been staring at the new pthread_cond_* code. Could you please tell
if this race possible (two wait threads looking at the same __g_signals
value while a third thread signals a wake before going to futex_wait()):

T1                               T2                       T3

__pthread_cond_wait_common()                            
 __condvar_fetch_add_wseq_acquire (cond, 2);
 __pthread_mutex_unlock_usercnt();                       

                                __pthread_cond_wait_common()
                                 __condvar_fetch_add_wseq_acquire (cond, 2);
                                 __pthread_mutex_unlock_usercnt();

                                                        __pthread_cond_signal()
                                                          if ((cond->__data.__g_size[g1] != 0) …

                                                           /* Add a signal. */
                                                           atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 2);

 signals = atomic_load_acquire (cond->__data.__g_signals + g);
                                 signals = atomic_load_acquire (cond->__data.__g_signals + g);

  signals = 2                     signals = 2

          /* If there is an available signal, don't block.  */
  if (signals != 0)               if (signals != 0)
      break;                         break;

  while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
                                                 &signals, signals - 2));
  signals = 0
                                  while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
                                                &signals, signals - 2));
                                   signals = (int) -2

   …                               …                          …

Or is there maybe a synchronisation I missed?

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-07 16:48 Is this pthread_cond_wait() race possible Sebastian Andrzej Siewior
@ 2017-06-07 19:41 ` Torvald Riegel
  2017-06-08  7:39   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Torvald Riegel @ 2017-06-07 19:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On Wed, 2017-06-07 at 18:48 +0200, Sebastian Andrzej Siewior wrote:
> Hi Torvald,
> 
> I've been staring at the new pthread_cond_* code. Could you please tell
> if this race possible (two wait threads looking at the same __g_signals
> value while a third thread signals a wake before going to futex_wait()):
> 
> T1                               T2                       T3
> 
> __pthread_cond_wait_common()                            
>  __condvar_fetch_add_wseq_acquire (cond, 2);
>  __pthread_mutex_unlock_usercnt();                       
> 
>                                 __pthread_cond_wait_common()
>                                  __condvar_fetch_add_wseq_acquire (cond, 2);
>                                  __pthread_mutex_unlock_usercnt();
> 
>                                                         __pthread_cond_signal()
>                                                           if ((cond->__data.__g_size[g1] != 0) …
> 
>                                                            /* Add a signal. */
>                                                            atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 2);
> 
>  signals = atomic_load_acquire (cond->__data.__g_signals + g);
>                                  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> 
>   signals = 2                     signals = 2
> 
>           /* If there is an available signal, don't block.  */
>   if (signals != 0)               if (signals != 0)
>       break;                         break;
> 
>   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
>                                                  &signals, signals - 2));
>   signals = 0
>                                   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
>                                                 &signals, signals - 2));
>                                    signals = (int) -2

The second CAS would expect a value of 2 in *(cond->__data.__g_signals +
g).  Because the first CAS succeeded in your example, the second CAS
will fail, and T2 will update it's view of __g_signals and store that in
signals.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-07 19:41 ` Torvald Riegel
@ 2017-06-08  7:39   ` Sebastian Andrzej Siewior
  2017-06-08  9:51     ` Torvald Riegel
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-08  7:39 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On 2017-06-07 21:41:38 [+0200], Torvald Riegel wrote:
> On Wed, 2017-06-07 at 18:48 +0200, Sebastian Andrzej Siewior wrote:
> > Hi Torvald,
> > 
> > I've been staring at the new pthread_cond_* code. Could you please tell
> > if this race possible (two wait threads looking at the same __g_signals
> > value while a third thread signals a wake before going to futex_wait()):
> > 
> > T1                               T2                       T3
> > 
> > __pthread_cond_wait_common()                            
> >  __condvar_fetch_add_wseq_acquire (cond, 2);
> >  __pthread_mutex_unlock_usercnt();                       
> > 
> >                                 __pthread_cond_wait_common()
> >                                  __condvar_fetch_add_wseq_acquire (cond, 2);
> >                                  __pthread_mutex_unlock_usercnt();
> > 
> >                                                         __pthread_cond_signal()
> >                                                           if ((cond->__data.__g_size[g1] != 0) …
> > 
> >                                                            /* Add a signal. */
> >                                                            atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 2);
> > 
> >  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> >                                  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> > 
> >   signals = 2                     signals = 2
> > 
> >           /* If there is an available signal, don't block.  */
> >   if (signals != 0)               if (signals != 0)
> >       break;                         break;
> > 
> >   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
> >                                                  &signals, signals - 2));
> >   signals = 0
> >                                   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
> >                                                 &signals, signals - 2));
> >                                    signals = (int) -2
> 
> The second CAS would expect a value of 2 in *(cond->__data.__g_signals +
> g).  Because the first CAS succeeded in your example, the second CAS
> will fail, and T2 will update it's view of __g_signals and store that in
> signals.

So T1 stores 2 -> 0 in __g_signals + g as expected.
T2 tries to store 2 -> 0 in __g_signals + g which fails because
__g_signals + g != signals (0 != 2). However
atomic_compare_exchange_weak_acquire() will update signals to 0 right?
So the second iteration of the while loop will update __g_signals + g to
0 -> -2. T2 won't loop until __g_signals becomes 2 again, right?

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-08  7:39   ` Sebastian Andrzej Siewior
@ 2017-06-08  9:51     ` Torvald Riegel
  2017-06-08 12:16       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Torvald Riegel @ 2017-06-08  9:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On Thu, 2017-06-08 at 09:39 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-07 21:41:38 [+0200], Torvald Riegel wrote:
> > On Wed, 2017-06-07 at 18:48 +0200, Sebastian Andrzej Siewior wrote:
> > > Hi Torvald,
> > > 
> > > I've been staring at the new pthread_cond_* code. Could you please tell
> > > if this race possible (two wait threads looking at the same __g_signals
> > > value while a third thread signals a wake before going to futex_wait()):
> > > 
> > > T1                               T2                       T3
> > > 
> > > __pthread_cond_wait_common()                            
> > >  __condvar_fetch_add_wseq_acquire (cond, 2);
> > >  __pthread_mutex_unlock_usercnt();                       
> > > 
> > >                                 __pthread_cond_wait_common()
> > >                                  __condvar_fetch_add_wseq_acquire (cond, 2);
> > >                                  __pthread_mutex_unlock_usercnt();
> > > 
> > >                                                         __pthread_cond_signal()
> > >                                                           if ((cond->__data.__g_size[g1] != 0) …
> > > 
> > >                                                            /* Add a signal. */
> > >                                                            atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 2);
> > > 
> > >  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> > >                                  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> > > 
> > >   signals = 2                     signals = 2
> > > 
> > >           /* If there is an available signal, don't block.  */
> > >   if (signals != 0)               if (signals != 0)
> > >       break;                         break;
> > > 
> > >   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
> > >                                                  &signals, signals - 2));
> > >   signals = 0
> > >                                   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
> > >                                                 &signals, signals - 2));
> > >                                    signals = (int) -2
> > 
> > The second CAS would expect a value of 2 in *(cond->__data.__g_signals +
> > g).  Because the first CAS succeeded in your example, the second CAS
> > will fail, and T2 will update it's view of __g_signals and store that in
> > signals.
> 
> So T1 stores 2 -> 0 in __g_signals + g as expected.
> T2 tries to store 2 -> 0 in __g_signals + g which fails because
> __g_signals + g != signals (0 != 2). However
> atomic_compare_exchange_weak_acquire() will update signals to 0 right?
> So the second iteration of the while loop will update __g_signals + g to
> 0 -> -2. T2 won't loop until __g_signals becomes 2 again, right?

No, the second iteration of the outer do-while loop will just run the
inner while loop; the inner loop doesn't exit until signals is non-zero
(or the group is closed or a timeout occurs, but then the CAS isn't
executed).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-08  9:51     ` Torvald Riegel
@ 2017-06-08 12:16       ` Sebastian Andrzej Siewior
  2017-06-08 12:28         ` Torvald Riegel
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-08 12:16 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On 2017-06-08 11:51:08 [+0200], Torvald Riegel wrote:
> No, the second iteration of the outer do-while loop will just run the
> inner while loop; the inner loop doesn't exit until signals is non-zero
> (or the group is closed or a timeout occurs, but then the CAS isn't
> executed).

Okay. The while loop is within the do loop and
atomic_compare_exchange_weak_acquire() was part of the outer do loop's
condition. It looked like it was there all by itself…
Thanks for clearing that up. So based on this, the waiter does no
locking except for its atomic ops which should not block (or let it loop
for a long time based on a condition). 
If that is the case, then it looks good.

The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
which is a handcrafted lock with atomics and two bits. Do you have an
integer field left? This seems to be only part that would break PI. That
is a low priority thread that owns the lock (and is scheduled out) will
make every subsequent thread end up in futex_wait(). An integer with
futex_lock_pi() should fix that.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-08 12:16       ` Sebastian Andrzej Siewior
@ 2017-06-08 12:28         ` Torvald Riegel
  2017-06-08 15:45           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Torvald Riegel @ 2017-06-08 12:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On Thu, 2017-06-08 at 14:16 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-08 11:51:08 [+0200], Torvald Riegel wrote:
> > No, the second iteration of the outer do-while loop will just run the
> > inner while loop; the inner loop doesn't exit until signals is non-zero
> > (or the group is closed or a timeout occurs, but then the CAS isn't
> > executed).
> 
> Okay. The while loop is within the do loop and
> atomic_compare_exchange_weak_acquire() was part of the outer do loop's
> condition. It looked like it was there all by itself…
> Thanks for clearing that up. So based on this, the waiter does no
> locking except for its atomic ops which should not block (or let it loop
> for a long time based on a condition). 
> If that is the case, then it looks good.
> 
> The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> which is a handcrafted lock with atomics and two bits. Do you have an
> integer field left?

No, otherwise I would have avoided the complexity of embedding the lock
bits :)  One could rearrange the bits in some other way to get a full
32b for the PI lock -- but I don't think that's the fundamental problem.

> This seems to be only part that would break PI. That
> is a low priority thread that owns the lock (and is scheduled out) will
> make every subsequent thread end up in futex_wait(). An integer with
> futex_lock_pi() should fix that.

Have you looked at the presentation I gave at Real-Time Summit last
year?  I think the problem we're facing re PI support is more
fundamental than getting a PI lock in there.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-08 12:28         ` Torvald Riegel
@ 2017-06-08 15:45           ` Sebastian Andrzej Siewior
  2017-06-08 16:51             ` Torvald Riegel
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-08 15:45 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On 2017-06-08 14:28:09 [+0200], Torvald Riegel wrote:
> > The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> > which is a handcrafted lock with atomics and two bits. Do you have an
> > integer field left?
> 
> No, otherwise I would have avoided the complexity of embedding the lock
> bits :)  One could rearrange the bits in some other way to get a full
> 32b for the PI lock -- but I don't think that's the fundamental problem.
> 
> > This seems to be only part that would break PI. That
> > is a low priority thread that owns the lock (and is scheduled out) will
> > make every subsequent thread end up in futex_wait(). An integer with
> > futex_lock_pi() should fix that.
> 
> Have you looked at the presentation I gave at Real-Time Summit last
> year?  I think the problem we're facing re PI support is more
> fundamental than getting a PI lock in there.

You refer to
  https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2016/pthread-condvars-posix-compliance-and-the-pi-gap_darren-hart_torvald-riegel.pdf

the last two slides?
FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI handled the user-mutex.
This is gone now. I don't know how much trouble it will cost us. Let's
see…
In the old implementation if you had a high-prio task in
pthread_cond_wait() then it already had the mutex on its return path and
it would boost the waker if needed. We don't have this "fast" kernel
boost, only after the task has been woken up and is going for the
userland mutex (which should be owned by the pthread_cond_signal()
caller). So this context switch is not optimal. *But* the signal path
does not hold any locks which made you lose PI while trying to obtain
the internal condvar mutex via futex_wait().
In the broadcast case all threads will compete for the one lock so
thundering herd here we are. But ideally the thread with highest
priority should get the lock first.
So based on that, the only thing I see is, that you end up in
futex_wait() via the one lock in pthread_cond_signal() (or
pthread_cond_timed_wait() ETIMEDOUT path).

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-08 15:45           ` Sebastian Andrzej Siewior
@ 2017-06-08 16:51             ` Torvald Riegel
  2017-06-30 15:29               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Torvald Riegel @ 2017-06-08 16:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On Thu, 2017-06-08 at 17:45 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-08 14:28:09 [+0200], Torvald Riegel wrote:
> > > The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> > > which is a handcrafted lock with atomics and two bits. Do you have an
> > > integer field left?
> > 
> > No, otherwise I would have avoided the complexity of embedding the lock
> > bits :)  One could rearrange the bits in some other way to get a full
> > 32b for the PI lock -- but I don't think that's the fundamental problem.
> > 
> > > This seems to be only part that would break PI. That
> > > is a low priority thread that owns the lock (and is scheduled out) will
> > > make every subsequent thread end up in futex_wait(). An integer with
> > > futex_lock_pi() should fix that.
> > 
> > Have you looked at the presentation I gave at Real-Time Summit last
> > year?  I think the problem we're facing re PI support is more
> > fundamental than getting a PI lock in there.
> 
> You refer to
>   https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2016/pthread-condvars-posix-compliance-and-the-pi-gap_darren-hart_torvald-riegel.pdf
> 
> the last two slides?

Yes.  Do these bring across what the underlying algorithmic problem is?

> FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI handled the user-mutex.
> This is gone now. I don't know how much trouble it will cost us. Let's
> see…
> In the old implementation

The old implementation did not handle all cases correctly.

> if you had a high-prio task in
> pthread_cond_wait() then it already had the mutex on its return path and
> it would boost the waker if needed. We don't have this "fast" kernel
> boost, only after the task has been woken up and is going for the
> userland mutex (which should be owned by the pthread_cond_signal()
> caller). So this context switch is not optimal. *But* the signal path
> does not hold any locks which made you lose PI while trying to obtain
> the internal condvar mutex via futex_wait().

I can't follow you.  What are you trying to say?

> In the broadcast case all threads will compete for the one lock so
> thundering herd here we are. But ideally the thread with highest
> priority should get the lock first.
> So based on that, the only thing I see is, that you end up in
> futex_wait() via the one lock in pthread_cond_signal() (or
> pthread_cond_timed_wait() ETIMEDOUT path).

I can't follow.  What is the "thing" you see, a problem, or specific
execution, or something else?

I think it would make sense to focus on correctness first.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is this pthread_cond_wait() race possible
  2017-06-08 16:51             ` Torvald Riegel
@ 2017-06-30 15:29               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-30 15:29 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: libc-alpha, tglx, Peter Zijlstra, Darren Hart

On 2017-06-08 18:50:58 [+0200], Torvald Riegel wrote:
> On Thu, 2017-06-08 at 17:45 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-08 14:28:09 [+0200], Torvald Riegel wrote:
> > > > The counterpart (pthread_cond_signal()) uses __condvar_acquire_lock()
> > > > which is a handcrafted lock with atomics and two bits. Do you have an
> > > > integer field left?
> > > 
> > > No, otherwise I would have avoided the complexity of embedding the lock
> > > bits :)  One could rearrange the bits in some other way to get a full
> > > 32b for the PI lock -- but I don't think that's the fundamental problem.
> > > 
> > > > This seems to be only part that would break PI. That
> > > > is a low priority thread that owns the lock (and is scheduled out) will
> > > > make every subsequent thread end up in futex_wait(). An integer with
> > > > futex_lock_pi() should fix that.
> > > 
> > > Have you looked at the presentation I gave at Real-Time Summit last
> > > year?  I think the problem we're facing re PI support is more
> > > fundamental than getting a PI lock in there.
> > 
> > You refer to
> >   https://wiki.linuxfoundation.org/_media/realtime/events/rt-summit2016/pthread-condvars-posix-compliance-and-the-pi-gap_darren-hart_torvald-riegel.pdf
> > 
> > the last two slides?
> 
> Yes.  Do these bring across what the underlying algorithmic problem is?

Unfortunately not.

> > FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI handled the user-mutex.
> > This is gone now. I don't know how much trouble it will cost us. Let's
> > see…
> > In the old implementation
> 
> The old implementation did not handle all cases correctly.
I do not question this. I wanted to point out that in the old code we
had the PI part.

> > if you had a high-prio task in
> > pthread_cond_wait() then it already had the mutex on its return path and
> > it would boost the waker if needed. We don't have this "fast" kernel
> > boost, only after the task has been woken up and is going for the
> > userland mutex (which should be owned by the pthread_cond_signal()
> > caller). So this context switch is not optimal. *But* the signal path
> > does not hold any locks which made you lose PI while trying to obtain
> > the internal condvar mutex via futex_wait().
> 
> I can't follow you.  What are you trying to say?

Say you have a threads with low, medium and high priority.
Thread L gets the lock (the self made with two bits).
Thread M pushes L away and does something else.
Thread H wants the lock L is holding. Instead of boosting L's priority
(as we would have with a PI mutex) it just goes into FUTEX_WAIT und
waits and M gets on the CPU. This means the H waits until L is runnable
again.

> > In the broadcast case all threads will compete for the one lock so
> > thundering herd here we are. But ideally the thread with highest
> > priority should get the lock first.
> > So based on that, the only thing I see is, that you end up in
> > futex_wait() via the one lock in pthread_cond_signal() (or
> > pthread_cond_timed_wait() ETIMEDOUT path).
> 
> I can't follow.  What is the "thing" you see, a problem, or specific
> execution, or something else?

based on review currently the two bit lock.

> I think it would make sense to focus on correctness first.
What is there still missing correctness wise?

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-30 15:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 16:48 Is this pthread_cond_wait() race possible Sebastian Andrzej Siewior
2017-06-07 19:41 ` Torvald Riegel
2017-06-08  7:39   ` Sebastian Andrzej Siewior
2017-06-08  9:51     ` Torvald Riegel
2017-06-08 12:16       ` Sebastian Andrzej Siewior
2017-06-08 12:28         ` Torvald Riegel
2017-06-08 15:45           ` Sebastian Andrzej Siewior
2017-06-08 16:51             ` Torvald Riegel
2017-06-30 15:29               ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).