public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847)
@ 2021-01-16 20:49 Malte Skarupke
  2021-01-16 20:49 ` [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed Malte Skarupke
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Malte Skarupke @ 2021-01-16 20:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: malteskarupke, triegel, Malte Skarupke

There was a rare bug in pthread_cond_wait's handling of the case when
a signal was stolen because a waiter took a long time to leave
pthread_cond_wait.

I wrote about the bug here:
https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/

The bug was subtle and only happened in an edge-case of an edge-case
so rather than fixing it, I decided to remove the outer edge-case:
By broadening the scope of grefs, stealing of signals becomes
impossible. A signaling thread will always wait for all waiters to
leave pthread_cond_wait before closing a group, so now no waiter from
the past can come back and steal a signal from a future group.

This change is the minimal amount of changes necessary to fix the bug.
This leads to slightly slower performance, but the next two patches
in this series will undo most of that damage.
---
 nptl/pthread_cond_wait.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 02d11c61db..0f50048c0b 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -405,6 +405,10 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   unsigned int g = wseq & 1;
   uint64_t seq = wseq >> 1;

+  /* Acquire a group reference and use acquire MO for that so that we
+     synchronize with the dummy read-modify-write in
+     __condvar_quiesce_and_switch_g1 if we read from that.  */
+  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
   /* Increase the waiter reference count.  Relaxed MO is sufficient because
      we only need to synchronize when decrementing the reference count.  */
   unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
@@ -422,6 +426,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
     {
       __condvar_cancel_waiting (cond, seq, g, private);
       __condvar_confirm_wakeup (cond, private);
+      __condvar_dec_grefs (cond, g, private);
       return err;
     }

@@ -471,24 +476,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	    break;

 	  /* No signals available after spinning, so prepare to block.
-	     We first acquire a group reference and use acquire MO for that so
-	     that we synchronize with the dummy read-modify-write in
-	     __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
-	     in this case this will make us see the closed flag on __g_signals
-	     that designates a concurrent attempt to reuse the group's slot.
-	     We use acquire MO for the __g_signals check to make the
-	     __g1_start check work (see spinning above).
-	     Note that the group reference acquisition will not mask the
-	     release MO when decrementing the reference count because we use
-	     an atomic read-modify-write operation and thus extend the release
-	     sequence.  */
-	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
+	     First check the closed flag on __g_signals that designates a
+	     concurrent attempt to reuse the group's slot. We use acquire MO for
+	     the __g_signals check to make the __g1_start check work (see
+	     spinning above).  */
 	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
 	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
 	    {
-	      /* Our group is closed.  Wake up any signalers that might be
-		 waiting.  */
-	      __condvar_dec_grefs (cond, g, private);
+	      /* Our group is closed.  */
 	      goto done;
 	    }

@@ -508,7 +503,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,

 	  if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW))
 	    {
-	      __condvar_dec_grefs (cond, g, private);
 	      /* If we timed out, we effectively cancel waiting.  Note that
 		 we have decremented __g_refs before cancellation, so that a
 		 deadlock between waiting for quiescence of our group in
@@ -518,8 +512,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	      result = err;
 	      goto done;
 	    }
-	  else
-	    __condvar_dec_grefs (cond, g, private);

 	  /* Reload signals.  See above for MO.  */
 	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
@@ -602,6 +594,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      to allow for execution of pthread_cond_destroy while having acquired the
      mutex.  */
   __condvar_confirm_wakeup (cond, private);
+  __condvar_dec_grefs (cond, g, private);

   /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RESULT,
      which is set to ETIMEDOUT if a timeout occured, or zero otherwise.  */
--
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-18 23:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
2021-01-16 20:49 ` [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed Malte Skarupke
2021-01-16 20:49 ` [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait Malte Skarupke
2021-01-18 23:41   ` Torvald Riegel
2021-01-16 20:49 ` [PATCH 4/5] nptl: Make test-cond-printers check the number of waiters Malte Skarupke
2021-01-16 20:49 ` [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed Malte Skarupke
2021-01-18 23:47   ` Torvald Riegel
2021-01-18 22:43 ` [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Torvald Riegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).