[PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847)

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847)
@ 2021-01-16 20:49 Malte Skarupke
  2021-01-16 20:49 ` [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed Malte Skarupke
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Malte Skarupke @ 2021-01-16 20:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: malteskarupke, triegel, Malte Skarupke

There was a rare bug in pthread_cond_wait's handling of the case when
a signal was stolen because a waiter took a long time to leave
pthread_cond_wait.

I wrote about the bug here:
https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/

The bug was subtle and only happened in an edge-case of an edge-case
so rather than fixing it, I decided to remove the outer edge-case:
By broadening the scope of grefs, stealing of signals becomes
impossible. A signaling thread will always wait for all waiters to
leave pthread_cond_wait before closing a group, so now no waiter from
the past can come back and steal a signal from a future group.

This change is the minimal amount of changes necessary to fix the bug.
This leads to slightly slower performance, but the next two patches
in this series will undo most of that damage.
---
 nptl/pthread_cond_wait.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 02d11c61db..0f50048c0b 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -405,6 +405,10 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   unsigned int g = wseq & 1;
   uint64_t seq = wseq >> 1;

+  /* Acquire a group reference and use acquire MO for that so that we
+     synchronize with the dummy read-modify-write in
+     __condvar_quiesce_and_switch_g1 if we read from that.  */
+  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
   /* Increase the waiter reference count.  Relaxed MO is sufficient because
      we only need to synchronize when decrementing the reference count.  */
   unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
@@ -422,6 +426,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
     {
       __condvar_cancel_waiting (cond, seq, g, private);
       __condvar_confirm_wakeup (cond, private);
+      __condvar_dec_grefs (cond, g, private);
       return err;
     }

@@ -471,24 +476,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	    break;

 	  /* No signals available after spinning, so prepare to block.
-	     We first acquire a group reference and use acquire MO for that so
-	     that we synchronize with the dummy read-modify-write in
-	     __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
-	     in this case this will make us see the closed flag on __g_signals
-	     that designates a concurrent attempt to reuse the group's slot.
-	     We use acquire MO for the __g_signals check to make the
-	     __g1_start check work (see spinning above).
-	     Note that the group reference acquisition will not mask the
-	     release MO when decrementing the reference count because we use
-	     an atomic read-modify-write operation and thus extend the release
-	     sequence.  */
-	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
+	     First check the closed flag on __g_signals that designates a
+	     concurrent attempt to reuse the group's slot. We use acquire MO for
+	     the __g_signals check to make the __g1_start check work (see
+	     spinning above).  */
 	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
 	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
 	    {
-	      /* Our group is closed.  Wake up any signalers that might be
-		 waiting.  */
-	      __condvar_dec_grefs (cond, g, private);
+	      /* Our group is closed.  */
 	      goto done;
 	    }

@@ -508,7 +503,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,

 	  if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW))
 	    {
-	      __condvar_dec_grefs (cond, g, private);
 	      /* If we timed out, we effectively cancel waiting.  Note that
 		 we have decremented __g_refs before cancellation, so that a
 		 deadlock between waiting for quiescence of our group in
@@ -518,8 +512,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	      result = err;
 	      goto done;
 	    }
-	  else
-	    __condvar_dec_grefs (cond, g, private);

 	  /* Reload signals.  See above for MO.  */
 	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
@@ -602,6 +594,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      to allow for execution of pthread_cond_destroy while having acquired the
      mutex.  */
   __condvar_confirm_wakeup (cond, private);
+  __condvar_dec_grefs (cond, g, private);

   /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RESULT,
      which is set to ETIMEDOUT if a timeout occured, or zero otherwise.  */
--
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed.
  2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
@ 2021-01-16 20:49 ` Malte Skarupke
  2021-01-16 20:49 ` [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait Malte Skarupke
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Malte Skarupke @ 2021-01-16 20:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: malteskarupke, triegel, Malte Skarupke

After my last change, stealing of signals can no longer happen. This
patch removes the code that handled the case when a signal was stolen.
---
 nptl/pthread_cond_wait.c | 63 ----------------------------------------
 1 file changed, 63 deletions(-)

diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 0f50048c0b..7b825f81df 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -525,69 +525,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
 						&signals, signals - 2));

-  /* We consumed a signal but we could have consumed from a more recent group
-     that aliased with ours due to being in the same group slot.  If this
-     might be the case our group must be closed as visible through
-     __g1_start.  */
-  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
-  if (seq < (g1_start >> 1))
-    {
-      /* We potentially stole a signal from a more recent group but we do not
-	 know which group we really consumed from.
-	 We do not care about groups older than current G1 because they are
-	 closed; we could have stolen from these, but then we just add a
-	 spurious wake-up for the current groups.
-	 We will never steal a signal from current G2 that was really intended
-	 for G2 because G2 never receives signals (until it becomes G1).  We
-	 could have stolen a signal from G2 that was conservatively added by a
-	 previous waiter that also thought it stole a signal -- but given that
-	 that signal was added unnecessarily, it's not a problem if we steal
-	 it.
-	 Thus, the remaining case is that we could have stolen from the current
-	 G1, where "current" means the __g1_start value we observed.  However,
-	 if the current G1 does not have the same slot index as we do, we did
-	 not steal from it and do not need to undo that.  This is the reason
-	 for putting a bit with G2's index into__g1_start as well.  */
-      if (((g1_start & 1) ^ 1) == g)
-	{
-	  /* We have to conservatively undo our potential mistake of stealing
-	     a signal.  We can stop trying to do that when the current G1
-	     changes because other spinning waiters will notice this too and
-	     __condvar_quiesce_and_switch_g1 has checked that there are no
-	     futex waiters anymore before switching G1.
-	     Relaxed MO is fine for the __g1_start load because we need to
-	     merely be able to observe this fact and not have to observe
-	     something else as well.
-	     ??? Would it help to spin for a little while to see whether the
-	     current G1 gets closed?  This might be worthwhile if the group is
-	     small or close to being closed.  */
-	  unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
-	  while (__condvar_load_g1_start_relaxed (cond) == g1_start)
-	    {
-	      /* Try to add a signal.  We don't need to acquire the lock
-		 because at worst we can cause a spurious wake-up.  If the
-		 group is in the process of being closed (LSB is true), this
-		 has an effect similar to us adding a signal.  */
-	      if (((s & 1) != 0)
-		  || atomic_compare_exchange_weak_relaxed
-		       (cond->__data.__g_signals + g, &s, s + 2))
-		{
-		  /* If we added a signal, we also need to add a wake-up on
-		     the futex.  We also need to do that if we skipped adding
-		     a signal because the group is being closed because
-		     while __condvar_quiesce_and_switch_g1 could have closed
-		     the group, it might stil be waiting for futex waiters to
-		     leave (and one of those waiters might be the one we stole
-		     the signal from, which cause it to block using the
-		     futex).  */
-		  futex_wake (cond->__data.__g_signals + g, 1, private);
-		  break;
-		}
-	      /* TODO Back off.  */
-	    }
-	}
-    }
-
  done:

   /* Confirm that we have been woken.  We do that before acquiring the mutex
--
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait
  2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
  2021-01-16 20:49 ` [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed Malte Skarupke
@ 2021-01-16 20:49 ` Malte Skarupke
  2021-01-18 23:41   ` Torvald Riegel
  2021-01-16 20:49 ` [PATCH 4/5] nptl: Make test-cond-printers check the number of waiters Malte Skarupke
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Malte Skarupke @ 2021-01-16 20:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: malteskarupke, triegel, Malte Skarupke

After I broadened the scope of grefs, it covered mostly the same scope
as wrefs. The duplicate atomic increment/decrement was unnecessary. In
this patch I remove the increment/decrement of wrefs.

One exception is the case when pthread_cancel is handled. The
interaction between __condvar_cleanup_waiting and
pthread_cond_destroy is complicated and required both variables. So in
order to preserve the existing behavior, I now increment/decrement
wrefs in __condvar_cleanup_waiting.
---
 nptl/nptl-printers.py          |  5 +++-
 nptl/nptl_lock_constants.pysym |  2 +-
 nptl/pthread_cond_broadcast.c  |  8 +++---
 nptl/pthread_cond_destroy.c    | 29 ++++++++++++++++------
 nptl/pthread_cond_signal.c     |  8 +++---
 nptl/pthread_cond_wait.c       | 45 ++++++++++++++++------------------
 6 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/nptl/nptl-printers.py b/nptl/nptl-printers.py
index e794034d83..1a404befe5 100644
--- a/nptl/nptl-printers.py
+++ b/nptl/nptl-printers.py
@@ -313,6 +313,7 @@ class ConditionVariablePrinter(object):

         data = cond['__data']
         self.wrefs = data['__wrefs']
+        self.grefs = data['__g_refs']
         self.values = []

         self.read_values()
@@ -350,8 +351,10 @@ class ConditionVariablePrinter(object):
         are waiting for it.
         """

+        num_readers_g0 = self.grefs[0] >> PTHREAD_COND_GREFS_SHIFT
+        num_readers_g1 = self.grefs[1] >> PTHREAD_COND_GREFS_SHIFT
         self.values.append(('Threads known to still execute a wait function',
-                            self.wrefs >> PTHREAD_COND_WREFS_SHIFT))
+                            num_readers_g0 + num_readers_g1))

     def read_attributes(self):
         """Read the condvar's attributes."""
diff --git a/nptl/nptl_lock_constants.pysym b/nptl/nptl_lock_constants.pysym
index ade4398e0c..2141cfa1f0 100644
--- a/nptl/nptl_lock_constants.pysym
+++ b/nptl/nptl_lock_constants.pysym
@@ -50,7 +50,7 @@ PTHREAD_COND_SHARED_MASK          __PTHREAD_COND_SHARED_MASK
 PTHREAD_COND_CLOCK_MONOTONIC_MASK __PTHREAD_COND_CLOCK_MONOTONIC_MASK
 COND_CLOCK_BITS
 -- These values are hardcoded:
-PTHREAD_COND_WREFS_SHIFT          3
+PTHREAD_COND_GREFS_SHIFT          1

 -- Rwlock attributes
 PTHREAD_RWLOCK_PREFER_READER_NP
diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c
index 8d887aab93..e10432ce7c 100644
--- a/nptl/pthread_cond_broadcast.c
+++ b/nptl/pthread_cond_broadcast.c
@@ -40,10 +40,12 @@ __pthread_cond_broadcast (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_broadcast, 1, cond);

-  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
-  if (wrefs >> 3 == 0)
+  unsigned int grefs0 = atomic_load_relaxed (cond->__data.__g_refs);
+  unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
+  if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  int private = __condvar_get_private (wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);

   __condvar_acquire_lock (cond, private);

diff --git a/nptl/pthread_cond_destroy.c b/nptl/pthread_cond_destroy.c
index 31034905d1..1c27385f89 100644
--- a/nptl/pthread_cond_destroy.c
+++ b/nptl/pthread_cond_destroy.c
@@ -37,22 +37,35 @@
    signal or broadcast calls.
    Thus, we can assume that all waiters that are still accessing the condvar
    have been woken.  We wait until they have confirmed to have woken up by
-   decrementing __wrefs.  */
+   decrementing __g_refs.  */
 int
 __pthread_cond_destroy (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_destroy, 1, cond);

-  /* Set the wake request flag.  We could also spin, but destruction that is
-     concurrent with still-active waiters is probably neither common nor
-     performance critical.  Acquire MO to synchronize with waiters confirming
-     that they finished.  */
-  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4);
-  int private = __condvar_get_private (wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);
+  for (unsigned g = 0; g < 2; ++g)
+    {
+      while (true)
+	{
+	  /* Set the wake request flag.  We could also spin, but destruction that is
+	     concurrent with still-active waiters is probably neither common nor
+	     performance critical.  Acquire MO to synchronize with waiters confirming
+	     that they finished.  */
+	  unsigned r = atomic_fetch_or_acquire (cond->__data.__g_refs + g, 1) | 1;
+	  if (r == 1)
+	    break;
+	  futex_wait_simple (cond->__data.__g_refs + g, r, private);
+	}
+    }
+
+  /* Same as above, except to synchronize with canceled threads.  This wake
+     flag never gets cleared, so it's enough to set it once.  */
+  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4) | 4;
   while (wrefs >> 3 != 0)
     {
       futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
-      /* See above.  */
       wrefs = atomic_load_acquire (&cond->__data.__wrefs);
     }
   /* The memory the condvar occupies can now be reused.  */
diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c
index 4281ad4d3b..0cd534cc40 100644
--- a/nptl/pthread_cond_signal.c
+++ b/nptl/pthread_cond_signal.c
@@ -39,10 +39,12 @@ __pthread_cond_signal (pthread_cond_t *cond)
   /* First check whether there are waiters.  Relaxed MO is fine for that for
      the same reasons that relaxed MO is fine when observing __wseq (see
      below).  */
-  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
-  if (wrefs >> 3 == 0)
+  unsigned int grefs0 = atomic_load_relaxed (cond->__data.__g_refs);
+  unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
+  if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  int private = __condvar_get_private (wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);

   __condvar_acquire_lock (cond, private);

diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 7b825f81df..0ee0247874 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -43,19 +43,6 @@ struct _condvar_cleanup_buffer
 };


-/* Decrease the waiter reference count.  */
-static void
-__condvar_confirm_wakeup (pthread_cond_t *cond, int private)
-{
-  /* If destruction is pending (i.e., the wake-request flag is nonzero) and we
-     are the last waiter (prior value of __wrefs was 1 << 3), then wake any
-     threads waiting in pthread_cond_destroy.  Release MO to synchronize with
-     these threads.  Don't bother clearing the wake-up request flag.  */
-  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) == 3)
-    futex_wake (&cond->__data.__wrefs, INT_MAX, private);
-}
-
-
 /* Cancel waiting after having registered as a waiter previously.  SEQ is our
    position and G is our group index.
    The goal of cancellation is to make our group smaller if that is still
@@ -81,7 +68,7 @@ __condvar_cancel_waiting (pthread_cond_t *cond, uint64_t seq, unsigned int g,
 {
   bool consumed_signal = false;

-  /* No deadlock with group switching is possible here because we have do
+  /* No deadlock with group switching is possible here because we do
      not hold a reference on the group.  */
   __condvar_acquire_lock (cond, private);

@@ -172,6 +159,14 @@ __condvar_cleanup_waiting (void *arg)
   pthread_cond_t *cond = cbuffer->cond;
   unsigned g = cbuffer->wseq & 1;

+  /* Normally we are not allowed to touch cond any more after calling
+     __condvar_dec_grefs, because pthread_cond_destroy looks at __g_refs to
+     determine when all waiters have woken. Since we will do more work in this
+     function, we are using an extra channel to communicate to
+     pthread_cond_destroy that it is not allowed to finish yet: We increment
+     the fourth bit on __wrefs. Relaxed MO is enough. The synchronization
+     happens because __condvar_dec_grefs uses release MO. */
+  atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
   __condvar_dec_grefs (cond, g, cbuffer->private);

   __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer->private);
@@ -183,7 +178,12 @@ __condvar_cleanup_waiting (void *arg)
      conservatively.  */
   futex_wake (cond->__data.__g_signals + g, 1, cbuffer->private);

-  __condvar_confirm_wakeup (cond, cbuffer->private);
+  /* If destruction is pending (i.e., the wake-request flag is nonzero) and we
+     are the last waiter (prior value of __wrefs was 1 << 3), then wake any
+     threads waiting in pthread_cond_destroy.  Release MO to synchronize with
+     these threads.  Don't bother clearing the wake-up request flag.  */
+  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) == 3)
+    futex_wake (&cond->__data.__wrefs, INT_MAX, cbuffer->private);

   /* XXX If locking the mutex fails, should we just stop execution?  This
      might be better than silently ignoring the error.  */
@@ -287,20 +287,21 @@ __condvar_cleanup_waiting (void *arg)
    __g1_orig_size: Initial size of G1
      * The two least-significant bits represent the condvar-internal lock.
      * Only accessed while having acquired the condvar-internal lock.
-   __wrefs: Waiter reference counter.
+   __wrefs: Flags and count of waiters who called pthread_cancel.
      * Bit 2 is true if waiters should run futex_wake when they remove the
        last reference.  pthread_cond_destroy uses this as futex word.
      * Bit 1 is the clock ID (0 == CLOCK_REALTIME, 1 == CLOCK_MONOTONIC).
      * Bit 0 is true iff this is a process-shared condvar.
-     * Simple reference count used by both waiters and pthread_cond_destroy.
-     (If the format of __wrefs is changed, update nptl_lock_constants.pysym
-      and the pretty printers.)
+     * Simple reference count used by __condvar_cleanup_waiting and pthread_cond_destroy.
+     (If the format of __wrefs is changed, update the pretty printers.)
    For each of the two groups, we have:
    __g_refs: Futex waiter reference count.
      * LSB is true if waiters should run futex_wake when they remove the
        last reference.
      * Reference count used by waiters concurrently with signalers that have
        acquired the condvar-internal lock.
+     (If the format of __g_refs is changed, update nptl_lock_constants.pysym
+      and the pretty printers.)
    __g_signals: The number of signals that can still be consumed.
      * Used as a futex word by waiters.  Used concurrently by waiters and
        signalers.
@@ -409,9 +410,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      synchronize with the dummy read-modify-write in
      __condvar_quiesce_and_switch_g1 if we read from that.  */
   atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
-  /* Increase the waiter reference count.  Relaxed MO is sufficient because
-     we only need to synchronize when decrementing the reference count.  */
-  unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
   int private = __condvar_get_private (flags);

   /* Now that we are registered as a waiter, we can release the mutex.
@@ -425,7 +424,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   if (__glibc_unlikely (err != 0))
     {
       __condvar_cancel_waiting (cond, seq, g, private);
-      __condvar_confirm_wakeup (cond, private);
       __condvar_dec_grefs (cond, g, private);
       return err;
     }
@@ -530,7 +528,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   /* Confirm that we have been woken.  We do that before acquiring the mutex
      to allow for execution of pthread_cond_destroy while having acquired the
      mutex.  */
-  __condvar_confirm_wakeup (cond, private);
   __condvar_dec_grefs (cond, g, private);

   /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RESULT,
--
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 4/5] nptl: Make test-cond-printers check the number of waiters
  2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
  2021-01-16 20:49 ` [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed Malte Skarupke
  2021-01-16 20:49 ` [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait Malte Skarupke
@ 2021-01-16 20:49 ` Malte Skarupke
  2021-01-16 20:49 ` [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed Malte Skarupke
  2021-01-18 22:43 ` [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Torvald Riegel
  4 siblings, 0 replies; 8+ messages in thread
From: Malte Skarupke @ 2021-01-16 20:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: malteskarupke, triegel, Malte Skarupke

In my last change I changed the semantics of how to determine the
number of waiters on a condition variable. The existing test only
tested that the printers print something. They didn't cover the case
when there is a thread sleeping on the condition variable. In this
patch I changed the test to ensure that the correct number is printed.

This is just to double-check the changes from my previous patch.
---
 nptl/test-cond-printers.c  | 56 +++++++++++++++++++++++++++++++++-----
 nptl/test-cond-printers.py |  5 ++++
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/nptl/test-cond-printers.c b/nptl/test-cond-printers.c
index 4b6db831f9..603a7cccee 100644
--- a/nptl/test-cond-printers.c
+++ b/nptl/test-cond-printers.c
@@ -26,7 +26,14 @@
 #define PASS 0
 #define FAIL 1

-static int test_status_destroyed (pthread_cond_t *condvar);
+static int test_status (pthread_cond_t *condvar);
+
+typedef struct
+{
+  pthread_mutex_t *mutex;
+  pthread_cond_t *condvar;
+  int *wait_thread_asleep;
+} test_state;

 int
 main (void)
@@ -36,22 +43,57 @@ main (void)
   int result = FAIL;

   if (pthread_condattr_init (&attr) == 0
-      && test_status_destroyed (&condvar) == PASS)
+      && test_status (&condvar) == PASS)
     result = PASS;
   /* Else, one of the pthread_cond* functions failed.  */

   return result;
 }

+static void *
+wait (void *arg)
+{
+  test_state *state = (test_state *)arg;
+  void *result = PASS;
+  if (pthread_mutex_lock (state->mutex) != 0)
+    result = (void *)FAIL;
+  *state->wait_thread_asleep = 1;
+  if (pthread_cond_signal (state->condvar) != 0)
+    result = (void *)FAIL;
+  if (pthread_cond_wait (state->condvar, state->mutex) != 0)
+    result = (void *)FAIL;
+  if (pthread_mutex_unlock (state->mutex) != 0)
+    result = (void *)FAIL;
+  return result;
+}
+
 /* Initializes CONDVAR, then destroys it.  */
 static int
-test_status_destroyed (pthread_cond_t *condvar)
+test_status (pthread_cond_t *condvar)
 {
-  int result = FAIL;
+  int result = PASS;

-  if (pthread_cond_init (condvar, NULL) == 0
-      && pthread_cond_destroy (condvar) == 0)
-    result = PASS; /* Test status (destroyed).  */
+  pthread_mutex_t mutex;
+  result |= pthread_mutex_init (&mutex, NULL);
+  result |= pthread_cond_init (condvar, NULL);
+  int wait_thread_asleep = 0;
+  test_state state = { &mutex, condvar, &wait_thread_asleep };
+  result |= pthread_mutex_lock (&mutex);
+  pthread_t thread;
+  result |= pthread_create (&thread, NULL, wait, &state);
+  while (!wait_thread_asleep)
+    {
+      result |= pthread_cond_wait (condvar, &mutex);
+    }
+  result |= pthread_cond_signal (condvar); /* Test about to signal */
+  result |= pthread_mutex_unlock (&mutex);
+  result |= pthread_cond_destroy (condvar);
+  void *retval = NULL;
+  result |= pthread_join (thread, &retval);  /* Test status (destroyed).  */
+  result |= pthread_mutex_destroy (&mutex);
+  result = result ? FAIL : PASS;
+  if (retval != NULL)
+    result = FAIL;

   return result;
 }
diff --git a/nptl/test-cond-printers.py b/nptl/test-cond-printers.py
index 38e2da4269..ab12218802 100644
--- a/nptl/test-cond-printers.py
+++ b/nptl/test-cond-printers.py
@@ -33,6 +33,11 @@ try:
     var = 'condvar'
     to_string = 'pthread_cond_t'

+    break_at(test_source, 'Test about to signal')
+    continue_cmd() # Go to test_status_destroyed
+    test_printer(var, to_string, {'Threads known to still execute a wait function': '1'})
+
+
     break_at(test_source, 'Test status (destroyed)')
     continue_cmd() # Go to test_status_destroyed
     test_printer(var, to_string, {'Threads known to still execute a wait function': '0'})
--
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed
  2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
                   ` (2 preceding siblings ...)
  2021-01-16 20:49 ` [PATCH 4/5] nptl: Make test-cond-printers check the number of waiters Malte Skarupke
@ 2021-01-16 20:49 ` Malte Skarupke
  2021-01-18 23:47   ` Torvald Riegel
  2021-01-18 22:43 ` [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Torvald Riegel
  4 siblings, 1 reply; 8+ messages in thread
From: Malte Skarupke @ 2021-01-16 20:49 UTC (permalink / raw)
  To: libc-alpha; +Cc: malteskarupke, triegel, Malte Skarupke

When I remove the increment/decrement of wrefs in pthread_cond_wait,
it no longer really had the meaning of representing the number of
waiters. So the name "wrefs" is no longer accurate. It is still used
as a reference count in an edge case, in the interaction between
pthread_cancel and pthread_cond_destroy, but that edge case shouldn't
be what this variable is named after.

The name __flags seems more appropriate since in the most common case
this variable is just used to store the flags of the condition
variable.
---
 nptl/nptl-printers.py                   |  6 +++---
 nptl/pthread_cond_broadcast.c           |  2 +-
 nptl/pthread_cond_common.c              |  4 ++--
 nptl/pthread_cond_destroy.c             |  8 ++++----
 nptl/pthread_cond_init.c                |  4 ++--
 nptl/pthread_cond_signal.c              |  2 +-
 nptl/pthread_cond_wait.c                | 16 ++++++++--------
 nptl/tst-cond22.c                       |  4 ++--
 sysdeps/nptl/bits/thread-shared-types.h |  2 +-
 9 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/nptl/nptl-printers.py b/nptl/nptl-printers.py
index 1a404befe5..0e37d8b4dd 100644
--- a/nptl/nptl-printers.py
+++ b/nptl/nptl-printers.py
@@ -312,7 +312,7 @@ class ConditionVariablePrinter(object):
         """

         data = cond['__data']
-        self.wrefs = data['__wrefs']
+        self.flags = data['__flags']
         self.grefs = data['__g_refs']
         self.values = []

@@ -359,12 +359,12 @@ class ConditionVariablePrinter(object):
     def read_attributes(self):
         """Read the condvar's attributes."""

-        if (self.wrefs & PTHREAD_COND_CLOCK_MONOTONIC_MASK) != 0:
+        if (self.flags & PTHREAD_COND_CLOCK_MONOTONIC_MASK) != 0:
             self.values.append(('Clock ID', 'CLOCK_MONOTONIC'))
         else:
             self.values.append(('Clock ID', 'CLOCK_REALTIME'))

-        if (self.wrefs & PTHREAD_COND_SHARED_MASK) != 0:
+        if (self.flags & PTHREAD_COND_SHARED_MASK) != 0:
             self.values.append(('Shared', 'Yes'))
         else:
             self.values.append(('Shared', 'No'))
diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c
index e10432ce7c..cbb249b35b 100644
--- a/nptl/pthread_cond_broadcast.c
+++ b/nptl/pthread_cond_broadcast.c
@@ -44,7 +44,7 @@ __pthread_cond_broadcast (pthread_cond_t *cond)
   unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
   if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__flags);
   int private = __condvar_get_private (flags);

   __condvar_acquire_lock (cond, private);
diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
index 3251c7f0ec..22270e8805 100644
--- a/nptl/pthread_cond_common.c
+++ b/nptl/pthread_cond_common.c
@@ -20,7 +20,7 @@
 #include <stdint.h>
 #include <pthread.h>

-/* We need 3 least-significant bits on __wrefs for something else.  */
+/* We need 3 least-significant bits on __flags for something else.  */
 #define __PTHREAD_COND_MAX_GROUP_SIZE ((unsigned) 1 << 29)

 #if __HAVE_64B_ATOMICS == 1
@@ -318,7 +318,7 @@ __condvar_set_orig_size (pthread_cond_t *cond, unsigned int size)
     atomic_store_relaxed (&cond->__data.__g1_orig_size, (size << 2) | 2);
 }

-/* Returns FUTEX_SHARED or FUTEX_PRIVATE based on the provided __wrefs
+/* Returns FUTEX_SHARED or FUTEX_PRIVATE based on the provided __flags
    value.  */
 static int __attribute__ ((unused))
 __condvar_get_private (int flags)
diff --git a/nptl/pthread_cond_destroy.c b/nptl/pthread_cond_destroy.c
index 1c27385f89..3dfe4a48db 100644
--- a/nptl/pthread_cond_destroy.c
+++ b/nptl/pthread_cond_destroy.c
@@ -43,7 +43,7 @@ __pthread_cond_destroy (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_destroy, 1, cond);

-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__flags);
   int private = __condvar_get_private (flags);
   for (unsigned g = 0; g < 2; ++g)
     {
@@ -62,11 +62,11 @@ __pthread_cond_destroy (pthread_cond_t *cond)

   /* Same as above, except to synchronize with canceled threads.  This wake
      flag never gets cleared, so it's enough to set it once.  */
-  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4) | 4;
+  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__flags, 4) | 4;
   while (wrefs >> 3 != 0)
     {
-      futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
-      wrefs = atomic_load_acquire (&cond->__data.__wrefs);
+      futex_wait_simple (&cond->__data.__flags, wrefs, private);
+      wrefs = atomic_load_acquire (&cond->__data.__flags);
     }
   /* The memory the condvar occupies can now be reused.  */
   return 0;
diff --git a/nptl/pthread_cond_init.c b/nptl/pthread_cond_init.c
index 595b1b3528..3031d52a42 100644
--- a/nptl/pthread_cond_init.c
+++ b/nptl/pthread_cond_init.c
@@ -37,13 +37,13 @@ __pthread_cond_init (pthread_cond_t *cond, const pthread_condattr_t *cond_attr)

   /* Iff not equal to ~0l, this is a PTHREAD_PROCESS_PRIVATE condvar.  */
   if (icond_attr != NULL && (icond_attr->value & 1) != 0)
-    cond->__data.__wrefs |= __PTHREAD_COND_SHARED_MASK;
+    cond->__data.__flags |= __PTHREAD_COND_SHARED_MASK;
   int clockid = (icond_attr != NULL
 		 ? ((icond_attr->value >> 1) & ((1 << COND_CLOCK_BITS) - 1))
 		 : CLOCK_REALTIME);
   /* If 0, CLOCK_REALTIME is used; CLOCK_MONOTONIC otherwise.  */
   if (clockid != CLOCK_REALTIME)
-    cond->__data.__wrefs |= __PTHREAD_COND_CLOCK_MONOTONIC_MASK;
+    cond->__data.__flags |= __PTHREAD_COND_CLOCK_MONOTONIC_MASK;

   LIBC_PROBE (cond_init, 2, cond, cond_attr);

diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c
index 0cd534cc40..979c9d72d5 100644
--- a/nptl/pthread_cond_signal.c
+++ b/nptl/pthread_cond_signal.c
@@ -43,7 +43,7 @@ __pthread_cond_signal (pthread_cond_t *cond)
   unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
   if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__flags);
   int private = __condvar_get_private (flags);

   __condvar_acquire_lock (cond, private);
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 0ee0247874..0993728e5d 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -164,9 +164,9 @@ __condvar_cleanup_waiting (void *arg)
      determine when all waiters have woken. Since we will do more work in this
      function, we are using an extra channel to communicate to
      pthread_cond_destroy that it is not allowed to finish yet: We increment
-     the fourth bit on __wrefs. Relaxed MO is enough. The synchronization
+     the fourth bit on __flags. Relaxed MO is enough. The synchronization
      happens because __condvar_dec_grefs uses release MO. */
-  atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
+  atomic_fetch_add_relaxed (&cond->__data.__flags, 8);
   __condvar_dec_grefs (cond, g, cbuffer->private);

   __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer->private);
@@ -182,8 +182,8 @@ __condvar_cleanup_waiting (void *arg)
      are the last waiter (prior value of __wrefs was 1 << 3), then wake any
      threads waiting in pthread_cond_destroy.  Release MO to synchronize with
      these threads.  Don't bother clearing the wake-up request flag.  */
-  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) == 3)
-    futex_wake (&cond->__data.__wrefs, INT_MAX, cbuffer->private);
+  if ((atomic_fetch_add_release (&cond->__data.__flags, -8) >> 2) == 3)
+    futex_wake (&cond->__data.__flags, INT_MAX, cbuffer->private);

   /* XXX If locking the mutex fails, should we just stop execution?  This
      might be better than silently ignoring the error.  */
@@ -287,13 +287,13 @@ __condvar_cleanup_waiting (void *arg)
    __g1_orig_size: Initial size of G1
      * The two least-significant bits represent the condvar-internal lock.
      * Only accessed while having acquired the condvar-internal lock.
-   __wrefs: Flags and count of waiters who called pthread_cancel.
+   __flags: Flags and count of waiters who called pthread_cancel.
      * Bit 2 is true if waiters should run futex_wake when they remove the
        last reference.  pthread_cond_destroy uses this as futex word.
      * Bit 1 is the clock ID (0 == CLOCK_REALTIME, 1 == CLOCK_MONOTONIC).
      * Bit 0 is true iff this is a process-shared condvar.
      * Simple reference count used by __condvar_cleanup_waiting and pthread_cond_destroy.
-     (If the format of __wrefs is changed, update the pretty printers.)
+     (If the format of __flags is changed, update the pretty printers.)
    For each of the two groups, we have:
    __g_refs: Futex waiter reference count.
      * LSB is true if waiters should run futex_wake when they remove the
@@ -410,7 +410,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      synchronize with the dummy read-modify-write in
      __condvar_quiesce_and_switch_g1 if we read from that.  */
   atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__flags);
   int private = __condvar_get_private (flags);

   /* Now that we are registered as a waiter, we can release the mutex.
@@ -558,7 +558,7 @@ __pthread_cond_timedwait64 (pthread_cond_t *cond, pthread_mutex_t *mutex,

   /* Relaxed MO is suffice because clock ID bit is only modified
      in condition creation.  */
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__flags);
   clockid_t clockid = (flags & __PTHREAD_COND_CLOCK_MONOTONIC_MASK)
                     ? CLOCK_MONOTONIC : CLOCK_REALTIME;
   return __pthread_cond_wait_common (cond, mutex, clockid, abstime);
diff --git a/nptl/tst-cond22.c b/nptl/tst-cond22.c
index 64f19ea0a5..e1338ebf94 100644
--- a/nptl/tst-cond22.c
+++ b/nptl/tst-cond22.c
@@ -110,7 +110,7 @@ do_test (void)
 	  c.__data.__wseq, c.__data.__g1_start,
 	  c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0],
 	  c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1],
-	  c.__data.__g1_orig_size, c.__data.__wrefs);
+	  c.__data.__g1_orig_size, c.__data.__flags);

   if (pthread_create (&th, NULL, tf, (void *) 1l) != 0)
     {
@@ -153,7 +153,7 @@ do_test (void)
 	  c.__data.__wseq, c.__data.__g1_start,
 	  c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0],
 	  c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1],
-	  c.__data.__g1_orig_size, c.__data.__wrefs);
+	  c.__data.__g1_orig_size, c.__data.__flags);

   return status;
 }
diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h
index fbbdd0bb36..84cedfcaa0 100644
--- a/sysdeps/nptl/bits/thread-shared-types.h
+++ b/sysdeps/nptl/bits/thread-shared-types.h
@@ -112,7 +112,7 @@ struct __pthread_cond_s
   unsigned int __g_refs[2] __LOCK_ALIGNMENT;
   unsigned int __g_size[2];
   unsigned int __g1_orig_size;
-  unsigned int __wrefs;
+  unsigned int __flags;
   unsigned int __g_signals[2];
 };

--
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847)
  2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
                   ` (3 preceding siblings ...)
  2021-01-16 20:49 ` [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed Malte Skarupke
@ 2021-01-18 22:43 ` Torvald Riegel
  4 siblings, 0 replies; 8+ messages in thread
From: Torvald Riegel @ 2021-01-18 22:43 UTC (permalink / raw)
  To: Malte Skarupke, libc-alpha; +Cc: malteskarupke

On Sat, 2021-01-16 at 15:49 -0500, Malte Skarupke wrote:
> This change is the minimal amount of changes necessary to fix the
> bug.
> This leads to slightly slower performance, but the next two patches
> in this series will undo most of that damage.

Is this based on experiments, or an assumption based on reasoning?

Which kinds of workloads are you talking about (e.g., high vs. low
contention, ...).

> ---
>  nptl/pthread_cond_wait.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
> index 02d11c61db..0f50048c0b 100644
> --- a/nptl/pthread_cond_wait.c
> +++ b/nptl/pthread_cond_wait.c
> @@ -405,6 +405,10 @@ __pthread_cond_wait_common (pthread_cond_t
> *cond, pthread_mutex_t *mutex,
>    unsigned int g = wseq & 1;
>    uint64_t seq = wseq >> 1;
> 
> +  /* Acquire a group reference and use acquire MO for that so that
> we
> +     synchronize with the dummy read-modify-write in
> +     __condvar_quiesce_and_switch_g1 if we read from that.  */
> +  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);

Please explain the choice of MO properly in comments, unless it's
obvious.  In this example, you just state that you want to have the
synchronize-with relation, but not why.

The comment you broke up has the second part ("In turn, ...") that
explains why we want to have it.  But have you checked that moving the
reference acquisition to earlier will still be correct, in particular
regarding MOs?  Your model didn't include MOs, IIRC, so this needs
reasoning.  We also want to make sure that our future selfs still can
reconstruct this understanding without having to start from scratch, so
we really need good comments.

>    /* Increase the waiter reference count.  Relaxed MO is sufficient
> because
>       we only need to synchronize when decrementing the reference
> count.  */
>    unsigned int flags = atomic_fetch_add_relaxed (&cond-
> >__data.__wrefs, 8);
> @@ -422,6 +426,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>      {
>        __condvar_cancel_waiting (cond, seq, g, private);
>        __condvar_confirm_wakeup (cond, private);
> +      __condvar_dec_grefs (cond, g, private);
>        return err;
>      }
> 
> @@ -471,24 +476,14 @@ __pthread_cond_wait_common (pthread_cond_t
> *cond, pthread_mutex_t *mutex,
>  	    break;
> 
>  	  /* No signals available after spinning, so prepare to block.
> -	     We first acquire a group reference and use acquire MO for
> that so
> -	     that we synchronize with the dummy read-modify-write in
> -	     __condvar_quiesce_and_switch_g1 if we read from that.  In
> turn,
> -	     in this case this will make us see the closed flag on
> __g_signals
> -	     that designates a concurrent attempt to reuse the group's
> slot.
> -	     We use acquire MO for the __g_signals check to make the
> -	     __g1_start check work (see spinning above).
> -	     Note that the group reference acquisition will not mask
> the
> -	     release MO when decrementing the reference count because
> we use
> -	     an atomic read-modify-write operation and thus extend the
> release
> -	     sequence.  */

You lose this last sentence, but it matters when explaining this.

> -	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
> +	     First check the closed flag on __g_signals that designates
> a
> +	     concurrent attempt to reuse the group's slot. We use
> acquire MO for
> +	     the __g_signals check to make the __g1_start check work
> (see
> +	     spinning above).  */

See above.

>  	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1)
> != 0)
>  	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
>  	    {
> -	      /* Our group is closed.  Wake up any signalers that might
> be
> -		 waiting.  */
> -	      __condvar_dec_grefs (cond, g, private);
> +	      /* Our group is closed.  */
>  	      goto done;
>  	    }
> 
> @@ -508,7 +503,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
> 
>  	  if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW))
>  	    {
> -	      __condvar_dec_grefs (cond, g, private);
>  	      /* If we timed out, we effectively cancel waiting.  Note
> that
>  		 we have decremented __g_refs before cancellation, so
> that a
>  		 deadlock between waiting for quiescence of our group
> in
> @@ -518,8 +512,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>  	      result = err;
>  	      goto done;
>  	    }
> -	  else
> -	    __condvar_dec_grefs (cond, g, private);
> 
>  	  /* Reload signals.  See above for MO.  */
>  	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> @@ -602,6 +594,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>       to allow for execution of pthread_cond_destroy while having
> acquired the
>       mutex.  */
>    __condvar_confirm_wakeup (cond, private);
> +  __condvar_dec_grefs (cond, g, private);

This is the wrong order.  After confirming the wakeup for cond_destroy,
the thread must not touch the condvar memory anymore (including reading
it because destruction could unmap it, for example; futex calls on it
are fine).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait
  2021-01-16 20:49 ` [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait Malte Skarupke
@ 2021-01-18 23:41   ` Torvald Riegel
  0 siblings, 0 replies; 8+ messages in thread
From: Torvald Riegel @ 2021-01-18 23:41 UTC (permalink / raw)
  To: Malte Skarupke, libc-alpha; +Cc: malteskarupke

On Sat, 2021-01-16 at 15:49 -0500, Malte Skarupke wrote:
> After I broadened the scope of grefs, it covered mostly the same
> scope
> as wrefs. The duplicate atomic increment/decrement was unnecessary. 

This will not work as is because __wrefs and __g_refs are used
differently, which matters for destruction safety.  See below for
details.

You should be able to fix it, but this may require a CAS instead of an
read-modify-write when decreasing group ref counts.  It might just be
faster to keep __wrefs (and less complex too).

The real potential for a performance degradation is not in the number
of atomic ops anyway, IMO, but in that your changes now require
signalers to wait for waiters when switching groups, even if the
waiters were just spinning (ie, in high-throughput scenarios).

AFAIA glibc is still lacking proper tuning of when to switch from
spinning to blocking via futexes, but this change should decrease the
benefits of spinning in condvars.

> diff --git a/nptl/pthread_cond_broadcast.c
> b/nptl/pthread_cond_broadcast.c
> index 8d887aab93..e10432ce7c 100644
> --- a/nptl/pthread_cond_broadcast.c
> +++ b/nptl/pthread_cond_broadcast.c
> @@ -40,10 +40,12 @@ __pthread_cond_broadcast (pthread_cond_t *cond)
>  {
>    LIBC_PROBE (cond_broadcast, 1, cond);
> 
> -  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
> -  if (wrefs >> 3 == 0)
> +  unsigned int grefs0 = atomic_load_relaxed (cond->__data.__g_refs);
> +  unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs +
> 1);
> +  if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)

See below.

>      return 0;
> -  int private = __condvar_get_private (wrefs);
> +  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
> +  int private = __condvar_get_private (flags);
> 
>    __condvar_acquire_lock (cond, private);
> 
> diff --git a/nptl/pthread_cond_destroy.c
> b/nptl/pthread_cond_destroy.c
> index 31034905d1..1c27385f89 100644
> --- a/nptl/pthread_cond_destroy.c
> +++ b/nptl/pthread_cond_destroy.c
> @@ -37,22 +37,35 @@
>     signal or broadcast calls.
>     Thus, we can assume that all waiters that are still accessing the
> condvar
>     have been woken.  We wait until they have confirmed to have woken
> up by
> -   decrementing __wrefs.  */
> +   decrementing __g_refs.  */
>  int
>  __pthread_cond_destroy (pthread_cond_t *cond)
>  {
>    LIBC_PROBE (cond_destroy, 1, cond);
> 
> -  /* Set the wake request flag.  We could also spin, but destruction
> that is
> -     concurrent with still-active waiters is probably neither common
> nor
> -     performance critical.  Acquire MO to synchronize with waiters
> confirming
> -     that they finished.  */
> -  unsigned int wrefs = atomic_fetch_or_acquire (&cond-
> >__data.__wrefs, 4);
> -  int private = __condvar_get_private (wrefs);
> +  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
> +  int private = __condvar_get_private (flags);
> +  for (unsigned g = 0; g < 2; ++g)
> +    {
> +      while (true)
> +	{
> +	  /* Set the wake request flag.  We could also spin, but
> destruction that is
> +	     concurrent with still-active waiters is probably neither
> common nor
> +	     performance critical.  Acquire MO to synchronize with
> waiters confirming
> +	     that they finished.  */
> +	  unsigned r = atomic_fetch_or_acquire (cond->__data.__g_refs +
> g, 1) | 1;
> +	  if (r == 1)
> +	    break;

You wait until the refcount is zero, but that is not necessarily the
last access to cond in __condvar_dev_grefs.  Hence this does not ensure
destruction safety.  Also see below.

> +	  futex_wait_simple (cond->__data.__g_refs + g, r, private);
> +	}
> +    }
> +
> +  /* Same as above, except to synchronize with canceled
> threads.  This wake
> +     flag never gets cleared, so it's enough to set it once.  */
> +  unsigned int wrefs = atomic_fetch_or_acquire (&cond-
> >__data.__wrefs, 4) | 4;
>    while (wrefs >> 3 != 0)
>      {
>        futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
> -      /* See above.  */
>        wrefs = atomic_load_acquire (&cond->__data.__wrefs);
>      }
>    /* The memory the condvar occupies can now be reused.  */
> diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c
> index 4281ad4d3b..0cd534cc40 100644
> --- a/nptl/pthread_cond_signal.c
> +++ b/nptl/pthread_cond_signal.c
> @@ -39,10 +39,12 @@ __pthread_cond_signal (pthread_cond_t *cond)
>    /* First check whether there are waiters.  Relaxed MO is fine for
> that for
>       the same reasons that relaxed MO is fine when observing __wseq
> (see
>       below).  */
> -  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
> -  if (wrefs >> 3 == 0)
> +  unsigned int grefs0 = atomic_load_relaxed (cond->__data.__g_refs);
> +  unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs +
> 1);
> +  if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
>      return 0;

This really needs an explanation why that is supposed to work.  The
existing comments talk about a single atomic load of wseq, but here you
have two separate atomic loads.

I believe it should be correct, but this isn't obvious and thus should
be clearly explained in a comment.

> -  int private = __condvar_get_private (wrefs);
> +  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
> +  int private = __condvar_get_private (flags);
> 
>    __condvar_acquire_lock (cond, private);
> 
> diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
> index 7b825f81df..0ee0247874 100644
> --- a/nptl/pthread_cond_wait.c
> +++ b/nptl/pthread_cond_wait.c
> @@ -43,19 +43,6 @@ struct _condvar_cleanup_buffer
>  };
> 
> 
> -/* Decrease the waiter reference count.  */
> -static void
> -__condvar_confirm_wakeup (pthread_cond_t *cond, int private)
> -{
> -  /* If destruction is pending (i.e., the wake-request flag is
> nonzero) and we
> -     are the last waiter (prior value of __wrefs was 1 << 3), then
> wake any
> -     threads waiting in pthread_cond_destroy.  Release MO to
> synchronize with
> -     these threads.  Don't bother clearing the wake-up request
> flag.  */
> -  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) ==
> 3)
> -    futex_wake (&cond->__data.__wrefs, INT_MAX, private);
> -}
> -
> -
>  /* Cancel waiting after having registered as a waiter
> previously.  SEQ is our
>     position and G is our group index.
>     The goal of cancellation is to make our group smaller if that is
> still
> @@ -81,7 +68,7 @@ __condvar_cancel_waiting (pthread_cond_t *cond,
> uint64_t seq, unsigned int g,
>  {
>    bool consumed_signal = false;
> 
> -  /* No deadlock with group switching is possible here because we
> have do
> +  /* No deadlock with group switching is possible here because we do
>       not hold a reference on the group.  */
>    __condvar_acquire_lock (cond, private);
> 
> @@ -172,6 +159,14 @@ __condvar_cleanup_waiting (void *arg)
>    pthread_cond_t *cond = cbuffer->cond;
>    unsigned g = cbuffer->wseq & 1;
> 
> +  /* Normally we are not allowed to touch cond any more after 

, after "Normally" and s/any more/anymore/

> calling
> +     __condvar_dec_grefs

Precisely, we are not allowed to touch memory anymore after
__condvar_confirm_wakeup -- which takes different actions than
__condvar_dec_grefs.  So these aren't quite the same.

> , because pthread_cond_destroy looks at __g_refs to
> +     determine when all waiters have woken. Since we will do more
> work in this
> +     function, we are using an extra channel to communicate to
> +     pthread_cond_destroy that it is not allowed to finish yet: We
> increment
> +     the fourth bit on __wrefs.

You increment the refcount consisting of the bits starting at the
fourth bit, not just the fourth bit.

> Relaxed MO is enough. The synchronization
> +     happens because __condvar_dec_grefs uses release MO. */
> +  atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
>    __condvar_dec_grefs (cond, g, cbuffer->private);
> 
>    __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer-
> >private);
> @@ -183,7 +178,12 @@ __condvar_cleanup_waiting (void *arg)
>       conservatively.  */
>    futex_wake (cond->__data.__g_signals + g, 1, cbuffer->private);
> 
> -  __condvar_confirm_wakeup (cond, cbuffer->private);
> +  /* If destruction is pending (i.e., the wake-request flag is
> nonzero) and we
> +     are the last waiter (prior value of __wrefs was 1 << 3), then
> wake any
> +     threads waiting in pthread_cond_destroy.  Release MO to
> synchronize with
> +     these threads.  Don't bother clearing the wake-up request
> flag.  */
> +  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) ==
> 3)
> +    futex_wake (&cond->__data.__wrefs, INT_MAX, cbuffer->private);
> 
>    /* XXX If locking the mutex fails, should we just stop
> execution?  This
>       might be better than silently ignoring the error.  */
> @@ -287,20 +287,21 @@ __condvar_cleanup_waiting (void *arg)
>     __g1_orig_size: Initial size of G1
>       * The two least-significant bits represent the condvar-internal 
> lock.
>       * Only accessed while having acquired the condvar-internal
> lock.
> -   __wrefs: Waiter reference counter.
> +   __wrefs: Flags and count of waiters who called pthread_cancel.

...and reference counter for waiters that called...

Also update the previous paragraphs in this algorithm overview that
discuss __g_refs and __wrefs. 

>       * Bit 2 is true if waiters should run futex_wake when they
> remove the
>         last reference.  pthread_cond_destroy uses this as futex
> word.
>       * Bit 1 is the clock ID (0 == CLOCK_REALTIME, 1 ==
> CLOCK_MONOTONIC).
>       * Bit 0 is true iff this is a process-shared condvar.
> -     * Simple reference count used by both waiters and
> pthread_cond_destroy.
> -     (If the format of __wrefs is changed, update
> nptl_lock_constants.pysym
> -      and the pretty printers.)
> +     * Simple reference count used by __condvar_cleanup_waiting and
> pthread_cond_destroy.
> +     (If the format of __wrefs is changed, update the pretty
> printers.)
>     For each of the two groups, we have:
>     __g_refs: Futex waiter reference count.
>       * LSB is true if waiters should run futex_wake when they remove
> the
>         last reference.
>       * Reference count used by waiters concurrently with signalers
> that have
>         acquired the condvar-internal lock.
> +     (If the format of __g_refs is changed, update
> nptl_lock_constants.pysym
> +      and the pretty printers.)
>     __g_signals: The number of signals that can still be consumed.
>       * Used as a futex word by waiters.  Used concurrently by
> waiters and
>         signalers.
> @@ -409,9 +410,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>       synchronize with the dummy read-modify-write in
>       __condvar_quiesce_and_switch_g1 if we read from that.  */
>    atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
> -  /* Increase the waiter reference count.  Relaxed MO is sufficient
> because
> -     we only need to synchronize when decrementing the reference
> count.  */
> -  unsigned int flags = atomic_fetch_add_relaxed (&cond-
> >__data.__wrefs, 8);
> +  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
>    int private = __condvar_get_private (flags);
> 
>    /* Now that we are registered as a waiter, we can release the
> mutex.
> @@ -425,7 +424,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>    if (__glibc_unlikely (err != 0))
>      {
>        __condvar_cancel_waiting (cond, seq, g, private);
> -      __condvar_confirm_wakeup (cond, private);
>        __condvar_dec_grefs (cond, g, private);
>        return err;
>      }
> @@ -530,7 +528,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>    /* Confirm that we have been woken.  We do that before acquiring
> the mutex
>       to allow for execution of pthread_cond_destroy while having
> acquired the
>       mutex.  */
> -  __condvar_confirm_wakeup (cond, private);
>    __condvar_dec_grefs (cond, g, private);

__condvar_dec_grefs clears the wake-up request flag after resetting the
waiter flag.  Thus, the former will not be the last access to the
condvar.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed
  2021-01-16 20:49 ` [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed Malte Skarupke
@ 2021-01-18 23:47   ` Torvald Riegel
  0 siblings, 0 replies; 8+ messages in thread
From: Torvald Riegel @ 2021-01-18 23:47 UTC (permalink / raw)
  To: Malte Skarupke, libc-alpha; +Cc: malteskarupke

On Sat, 2021-01-16 at 15:49 -0500, Malte Skarupke wrote:
> When I remove the increment/decrement of wrefs in pthread_cond_wait,
> it no longer really had the meaning of representing the number of
> waiters. So the name "wrefs" is no longer accurate. It is still used
> as a reference count in an edge case, in the interaction between
> pthread_cancel and pthread_cond_destroy, but that edge case shouldn't
> be what this variable is named after.

I don't think that this change is good.  Cancellation is not
necessarily an "edge case" because it will happen whenever timeouts are
involved.  More importantly, the wake-up flag is an integral part of
the refcount.  The other flags in there are just in there because
available space was scarce (ABI...).  So if you want to rename it, I'd
make it "cancellation_refs" or "crefs" or something like that, not just
"flags".

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-18 23:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-16 20:49 [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Malte Skarupke
2021-01-16 20:49 ` [PATCH 2/5] nptl: Remove the signal-stealing code. It is no longer needed Malte Skarupke
2021-01-16 20:49 ` [PATCH 3/5] nptl: Optimization by not incrementing wrefs in pthread_cond_wait Malte Skarupke
2021-01-18 23:41   ` Torvald Riegel
2021-01-16 20:49 ` [PATCH 4/5] nptl: Make test-cond-printers check the number of waiters Malte Skarupke
2021-01-16 20:49 ` [PATCH 5/5] nptl: Rename __wrefs to __flags because its meaning has changed Malte Skarupke
2021-01-18 23:47   ` Torvald Riegel
2021-01-18 22:43 ` [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Torvald Riegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).