public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper
@ 2022-10-06 21:43 malteskarupke
  2022-10-06 21:43 ` [PATCH v3 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) malteskarupke
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha

The first patch in this series fixes BZ 25847, the remaining patches
do follow-up clean-up work.

-- New in v3:
Fixed that the first patch didn't work on its own. It had a bug that
was fixed in the third patch in the series. Now the series can be
partially applied and it's fine to stop after any patch.

Also rebased the patches to work on top of 2.36/master

-- New in v2:
The first patch now has the calls at the end of pthread_cond_wait in
the right order.

The third patch now clears the wake-request flag correctly, removing a
case where a waiter could write to a condvar after it was destroyed.

The fifth patch now renames wrefs to crefs.

I also updated comments as requested.

Finally I added another patch because I realized that g1_start was too
complex for its reduced role after signal stealing was no longer
possible, so the final patch cleans that up.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847)
  2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
@ 2022-10-06 21:43 ` malteskarupke
  2022-10-06 21:43 ` [PATCH v3 2/6] nptl: Remove the signal-stealing code. It is no longer needed malteskarupke
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Malte Skarupke

From: Malte Skarupke <malteskarupke@fastmail.fm>

There was a rare bug in pthread_cond_wait's handling of the case when
a signal was stolen because a waiter took a long time to leave
pthread_cond_wait.

I wrote about the bug here:
https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/
and here:
https://probablydance.com/2022/09/17/finding-the-second-bug-in-glibcs-condition-variable/

The bug was subtle and only happened in an edge-case of an edge-case
so rather than fixing it, I decided to remove the outer edge-case:
By broadening the scope of grefs, stealing of signals becomes
impossible. A signaling thread will always wait for all waiters to
leave pthread_cond_wait before closing a group, so now no waiter from
the past can come back and steal a signal from a future group.

This is patch 1/6, it contains the minimal amount of changes
necessary to fix the bug. This leads to an unnecessary amount of
atomic operations, but the other patches in this series will undo
most of that damage.
---
 nptl/pthread_cond_wait.c | 39 ++++++++++++++++++---------------------
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 20c348a503..7b9116c930 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -408,6 +408,12 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      we only need to synchronize when decrementing the reference count.  */
   unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
   int private = __condvar_get_private (flags);
+  /* Acquire a group reference and use acquire MO for that so that we
+     synchronize with the dummy read-modify-write in
+     __condvar_quiesce_and_switch_g1 if we read from the same group.  This will
+     make us see the closed flag on __g_signals that designates a concurrent
+     attempt to reuse the group's slot. */
+  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
 
   /* Now that we are registered as a waiter, we can release the mutex.
      Waiting on the condvar must be atomic with releasing the mutex, so if
@@ -419,6 +425,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   err = __pthread_mutex_unlock_usercnt (mutex, 0);
   if (__glibc_unlikely (err != 0))
     {
+      __condvar_dec_grefs (cond, g, private);
       __condvar_cancel_waiting (cond, seq, g, private);
       __condvar_confirm_wakeup (cond, private);
       return err;
@@ -470,24 +477,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	    break;
 
 	  /* No signals available after spinning, so prepare to block.
-	     We first acquire a group reference and use acquire MO for that so
-	     that we synchronize with the dummy read-modify-write in
-	     __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
-	     in this case this will make us see the closed flag on __g_signals
-	     that designates a concurrent attempt to reuse the group's slot.
-	     We use acquire MO for the __g_signals check to make the
-	     __g1_start check work (see spinning above).
-	     Note that the group reference acquisition will not mask the
-	     release MO when decrementing the reference count because we use
-	     an atomic read-modify-write operation and thus extend the release
-	     sequence.  */
-	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
+	     First check the closed flag on __g_signals that designates a
+	     concurrent attempt to reuse the group's slot. We use acquire MO for
+	     the __g_signals check to make the __g1_start check work (see
+	     above).  */
 	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
 	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
 	    {
-	      /* Our group is closed.  Wake up any signalers that might be
-		 waiting.  */
-	      __condvar_dec_grefs (cond, g, private);
+	      /* Our group is closed.  */
 	      goto done;
 	    }
 
@@ -515,10 +512,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 		 the lock during cancellation is not possible.  */
 	      __condvar_cancel_waiting (cond, seq, g, private);
 	      result = err;
-	      goto done;
+	      goto confirm_wakeup;
 	    }
-	  else
-	    __condvar_dec_grefs (cond, g, private);
 
 	  /* Reload signals.  See above for MO.  */
 	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
@@ -597,9 +592,11 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 
  done:
 
-  /* Confirm that we have been woken.  We do that before acquiring the mutex
-     to allow for execution of pthread_cond_destroy while having acquired the
-     mutex.  */
+  /* Decrement group reference count and confirm that we have been woken.  We do
+     that before acquiring the mutex to allow for execution of
+     pthread_cond_destroy while having acquired the mutex.  */
+  __condvar_dec_grefs (cond, g, private);
+confirm_wakeup:
   __condvar_confirm_wakeup (cond, private);
 
   /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RESULT,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 2/6] nptl: Remove the signal-stealing code. It is no longer needed.
  2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
  2022-10-06 21:43 ` [PATCH v3 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) malteskarupke
@ 2022-10-06 21:43 ` malteskarupke
  2022-10-06 21:43 ` [PATCH v3 3/6] nptl: Optimization by not incrementing wrefs in pthread_cond_wait malteskarupke
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Malte Skarupke

From: Malte Skarupke <malteskarupke@fastmail.fm>

After my last change, stealing of signals can no longer happen. This
patch removes the code that handled the case when a signal was stolen.
---
 nptl/pthread_cond_wait.c | 63 ----------------------------------------
 1 file changed, 63 deletions(-)

diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 7b9116c930..0502b5ad3f 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -527,69 +527,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
 						&signals, signals - 2));
 
-  /* We consumed a signal but we could have consumed from a more recent group
-     that aliased with ours due to being in the same group slot.  If this
-     might be the case our group must be closed as visible through
-     __g1_start.  */
-  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
-  if (seq < (g1_start >> 1))
-    {
-      /* We potentially stole a signal from a more recent group but we do not
-	 know which group we really consumed from.
-	 We do not care about groups older than current G1 because they are
-	 closed; we could have stolen from these, but then we just add a
-	 spurious wake-up for the current groups.
-	 We will never steal a signal from current G2 that was really intended
-	 for G2 because G2 never receives signals (until it becomes G1).  We
-	 could have stolen a signal from G2 that was conservatively added by a
-	 previous waiter that also thought it stole a signal -- but given that
-	 that signal was added unnecessarily, it's not a problem if we steal
-	 it.
-	 Thus, the remaining case is that we could have stolen from the current
-	 G1, where "current" means the __g1_start value we observed.  However,
-	 if the current G1 does not have the same slot index as we do, we did
-	 not steal from it and do not need to undo that.  This is the reason
-	 for putting a bit with G2's index into__g1_start as well.  */
-      if (((g1_start & 1) ^ 1) == g)
-	{
-	  /* We have to conservatively undo our potential mistake of stealing
-	     a signal.  We can stop trying to do that when the current G1
-	     changes because other spinning waiters will notice this too and
-	     __condvar_quiesce_and_switch_g1 has checked that there are no
-	     futex waiters anymore before switching G1.
-	     Relaxed MO is fine for the __g1_start load because we need to
-	     merely be able to observe this fact and not have to observe
-	     something else as well.
-	     ??? Would it help to spin for a little while to see whether the
-	     current G1 gets closed?  This might be worthwhile if the group is
-	     small or close to being closed.  */
-	  unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
-	  while (__condvar_load_g1_start_relaxed (cond) == g1_start)
-	    {
-	      /* Try to add a signal.  We don't need to acquire the lock
-		 because at worst we can cause a spurious wake-up.  If the
-		 group is in the process of being closed (LSB is true), this
-		 has an effect similar to us adding a signal.  */
-	      if (((s & 1) != 0)
-		  || atomic_compare_exchange_weak_relaxed
-		       (cond->__data.__g_signals + g, &s, s + 2))
-		{
-		  /* If we added a signal, we also need to add a wake-up on
-		     the futex.  We also need to do that if we skipped adding
-		     a signal because the group is being closed because
-		     while __condvar_quiesce_and_switch_g1 could have closed
-		     the group, it might stil be waiting for futex waiters to
-		     leave (and one of those waiters might be the one we stole
-		     the signal from, which cause it to block using the
-		     futex).  */
-		  futex_wake (cond->__data.__g_signals + g, 1, private);
-		  break;
-		}
-	      /* TODO Back off.  */
-	    }
-	}
-    }
-
  done:
 
   /* Decrement group reference count and confirm that we have been woken.  We do
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 3/6] nptl: Optimization by not incrementing wrefs in pthread_cond_wait
  2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
  2022-10-06 21:43 ` [PATCH v3 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) malteskarupke
  2022-10-06 21:43 ` [PATCH v3 2/6] nptl: Remove the signal-stealing code. It is no longer needed malteskarupke
@ 2022-10-06 21:43 ` malteskarupke
  2022-10-06 21:43 ` [PATCH v3 4/6] nptl: Make test-cond-printers check the number of waiters malteskarupke
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Malte Skarupke

From: Malte Skarupke <malteskarupke@fastmail.fm>

After I broadened the scope of grefs, it covered mostly the same scope
as wrefs. The duplicate atomic increment/decrement was unnecessary. In
this patch I remove the increment/decrement of wrefs.

One exception is the case when pthread_cancel is handled. The
interaction between __condvar_cleanup_waiting and
pthread_cond_destroy is complicated and required both variables. So in
order to preserve the existing behavior, I now increment/decrement
wrefs in __condvar_cleanup_waiting.

Another change is that quiesce_and_switch_g1 now clears the
wake-request flag that it sets. It used to be cleared when the last
waiter in the old group leaves pthread_cond_wait. The problem with
this was that it could result in a race with the new
pthread_cond_destroy behavior, where the leaving thread would allow
pthread_cond_destroy to finish and then modify the wake-request flag
after the destroy. This was pointed out in the review of an earlier
version of this patch, and the fix is to make quiesce_and_switch_g1
clear up its own flag.
---
 nptl/nptl-printers.py          |  5 ++-
 nptl/nptl_lock_constants.pysym |  2 +-
 nptl/pthread_cond_broadcast.c  |  9 ++--
 nptl/pthread_cond_common.c     | 17 +++-----
 nptl/pthread_cond_destroy.c    | 30 +++++++++----
 nptl/pthread_cond_signal.c     | 22 +++++++---
 nptl/pthread_cond_wait.c       | 79 ++++++++++++----------------------
 7 files changed, 83 insertions(+), 81 deletions(-)

diff --git a/nptl/nptl-printers.py b/nptl/nptl-printers.py
index 4890e60058..3fb0335135 100644
--- a/nptl/nptl-printers.py
+++ b/nptl/nptl-printers.py
@@ -313,6 +313,7 @@ class ConditionVariablePrinter(object):
 
         data = cond['__data']
         self.wrefs = data['__wrefs']
+        self.grefs = data['__g_refs']
         self.values = []
 
         self.read_values()
@@ -350,8 +351,10 @@ class ConditionVariablePrinter(object):
         are waiting for it.
         """
 
+        num_readers_g0 = self.grefs[0] >> PTHREAD_COND_GREFS_SHIFT
+        num_readers_g1 = self.grefs[1] >> PTHREAD_COND_GREFS_SHIFT
         self.values.append(('Threads known to still execute a wait function',
-                            self.wrefs >> PTHREAD_COND_WREFS_SHIFT))
+                            num_readers_g0 + num_readers_g1))
 
     def read_attributes(self):
         """Read the condvar's attributes."""
diff --git a/nptl/nptl_lock_constants.pysym b/nptl/nptl_lock_constants.pysym
index ade4398e0c..2141cfa1f0 100644
--- a/nptl/nptl_lock_constants.pysym
+++ b/nptl/nptl_lock_constants.pysym
@@ -50,7 +50,7 @@ PTHREAD_COND_SHARED_MASK          __PTHREAD_COND_SHARED_MASK
 PTHREAD_COND_CLOCK_MONOTONIC_MASK __PTHREAD_COND_CLOCK_MONOTONIC_MASK
 COND_CLOCK_BITS
 -- These values are hardcoded:
-PTHREAD_COND_WREFS_SHIFT          3
+PTHREAD_COND_GREFS_SHIFT          1
 
 -- Rwlock attributes
 PTHREAD_RWLOCK_PREFER_READER_NP
diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c
index 5ae141ac81..e45f6271bf 100644
--- a/nptl/pthread_cond_broadcast.c
+++ b/nptl/pthread_cond_broadcast.c
@@ -39,10 +39,13 @@ ___pthread_cond_broadcast (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_broadcast, 1, cond);
 
-  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
-  if (wrefs >> 3 == 0)
+  /* See pthread_cond_signal for why relaxed MO is enough here. */
+  unsigned int grefs0 = atomic_load_relaxed (cond->__data.__g_refs);
+  unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
+  if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  int private = __condvar_get_private (wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);
 
   __condvar_acquire_lock (cond, private);
 
diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
index fb035f72c3..ce09d5d15a 100644
--- a/nptl/pthread_cond_common.c
+++ b/nptl/pthread_cond_common.c
@@ -226,18 +226,15 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
        __g_signals, which will prevent waiters from blocking using a futex on
        __g_signals and also notifies them that the group is closed.  As a
        result, they will eventually remove their group reference, allowing us
-       to close switch group roles.  */
+       to close and switch group roles.  */
 
   /* First, set the closed flag on __g_signals.  This tells waiters that are
      about to wait that they shouldn't do that anymore.  This basically
      serves as an advance notificaton of the upcoming change to __g1_start;
      waiters interpret it as if __g1_start was larger than their waiter
      sequence position.  This allows us to change __g1_start after waiting
-     for all existing waiters with group references to leave, which in turn
-     makes recovery after stealing a signal simpler because it then can be
-     skipped if __g1_start indicates that the group is closed (otherwise,
-     we would have to recover always because waiters don't know how big their
-     groups are).  Relaxed MO is fine.  */
+     for all existing waiters with group references to leave.
+     Relaxed MO is fine.  */
   atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1);
 
   /* Wait until there are no group references anymore.  The fetch-or operation
@@ -279,10 +276,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
 	  r = atomic_load_relaxed (cond->__data.__g_refs + g1);
 	}
     }
-  /* Acquire MO so that we synchronize with the release operation that waiters
-     use to decrement __g_refs and thus happen after the waiters we waited
-     for.  */
-  atomic_thread_fence_acquire ();
+  /* Clear the wake-request flag. Acquire MO so that we synchronize with the
+     release operation that waiters use to decrement __g_refs and thus happen
+     after the waiters we waited for.  */
+  atomic_fetch_and_acquire (cond->__data.__g_refs + g1, ~(unsigned int)1);
 
   /* Update __g1_start, which finishes closing this group.  The value we add
      will never be negative because old_orig_size can only be zero when we
diff --git a/nptl/pthread_cond_destroy.c b/nptl/pthread_cond_destroy.c
index 42bf04a9f0..053d0a2fbc 100644
--- a/nptl/pthread_cond_destroy.c
+++ b/nptl/pthread_cond_destroy.c
@@ -36,22 +36,36 @@
    signal or broadcast calls.
    Thus, we can assume that all waiters that are still accessing the condvar
    have been woken.  We wait until they have confirmed to have woken up by
-   decrementing __wrefs.  */
+   decrementing __g_refs.  */
 int
 __pthread_cond_destroy (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_destroy, 1, cond);
 
-  /* Set the wake request flag.  We could also spin, but destruction that is
-     concurrent with still-active waiters is probably neither common nor
-     performance critical.  Acquire MO to synchronize with waiters confirming
-     that they finished.  */
-  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4);
-  int private = __condvar_get_private (wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);
+  for (unsigned g = 0; g < 2; ++g)
+    {
+      while (true)
+  {
+    /* Set the wake request flag.  We could also spin, but destruction
+       that is concurrent with still-active waiters is probably neither
+       common nor performance critical.  Acquire MO to synchronize with
+       waiters confirming that they finished.  */
+    unsigned r = atomic_fetch_or_acquire (cond->__data.__g_refs + g, 1);
+    r |= 1;
+    if (r == 1)
+      break;
+    futex_wait_simple (cond->__data.__g_refs + g, r, private);
+  }
+    }
+
+  /* Same as above, except to synchronize with canceled threads.  This wake
+     flag never gets cleared, so it's enough to set it once.  */
+  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4) | 4;
   while (wrefs >> 3 != 0)
     {
       futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
-      /* See above.  */
       wrefs = atomic_load_acquire (&cond->__data.__wrefs);
     }
   /* The memory the condvar occupies can now be reused.  */
diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c
index 14800ba00b..2e8be2d3b5 100644
--- a/nptl/pthread_cond_signal.c
+++ b/nptl/pthread_cond_signal.c
@@ -35,13 +35,21 @@ ___pthread_cond_signal (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_signal, 1, cond);
 
-  /* First check whether there are waiters.  Relaxed MO is fine for that for
-     the same reasons that relaxed MO is fine when observing __wseq (see
-     below).  */
-  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
-  if (wrefs >> 3 == 0)
+  /* First check whether there are waiters.  Relaxed MO is fine for that, and
+     it doesn't matter that there are two separate loads.  It could only
+     matter if another thread is calling pthread_cond_wait at the same time
+     as this function, but then there is no happens-before relationship with
+     this thread, and the caller can't tell which call came first. If we miss
+     a waiter, the caller would have to assume that pthread_cond_signal got
+     called before pthread_wait, and they have no way of telling otherwise.
+     If they do have a way of telling then there is a happens-before
+     relationship and we're guaranteed to see the waiter here.  */
+  unsigned int grefs0 = atomic_load_relaxed (cond->__data.__g_refs);
+  unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
+  if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  int private = __condvar_get_private (wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);
 
   __condvar_acquire_lock (cond, private);
 
@@ -50,7 +58,7 @@ ___pthread_cond_signal (pthread_cond_t *cond)
      1) We can pick any position that is allowed by external happens-before
         constraints.  In particular, if another __pthread_cond_wait call
         happened before us, this waiter must be eligible for being woken by
-        us.  The only way do establish such a happens-before is by signaling
+        us.  The only way to establish such a happens-before is by signaling
         while having acquired the mutex associated with the condvar and
         ensuring that the signal's critical section happens after the waiter.
         Thus, the mutex ensures that we see that waiter's __wseq increase.
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 0502b5ad3f..b949805dae 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -42,19 +42,6 @@ struct _condvar_cleanup_buffer
 };
 
 
-/* Decrease the waiter reference count.  */
-static void
-__condvar_confirm_wakeup (pthread_cond_t *cond, int private)
-{
-  /* If destruction is pending (i.e., the wake-request flag is nonzero) and we
-     are the last waiter (prior value of __wrefs was 1 << 3), then wake any
-     threads waiting in pthread_cond_destroy.  Release MO to synchronize with
-     these threads.  Don't bother clearing the wake-up request flag.  */
-  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) == 3)
-    futex_wake (&cond->__data.__wrefs, INT_MAX, private);
-}
-
-
 /* Cancel waiting after having registered as a waiter previously.  SEQ is our
    position and G is our group index.
    The goal of cancellation is to make our group smaller if that is still
@@ -150,14 +137,7 @@ __condvar_dec_grefs (pthread_cond_t *cond, unsigned int g, int private)
   /* Release MO to synchronize-with the acquire load in
      __condvar_quiesce_and_switch_g1.  */
   if (atomic_fetch_add_release (cond->__data.__g_refs + g, -2) == 3)
-    {
-      /* Clear the wake-up request flag before waking up.  We do not need more
-	 than relaxed MO and it doesn't matter if we apply this for an aliased
-	 group because we wake all futex waiters right after clearing the
-	 flag.  */
-      atomic_fetch_and_relaxed (cond->__data.__g_refs + g, ~(unsigned int) 1);
-      futex_wake (cond->__data.__g_refs + g, INT_MAX, private);
-    }
+    futex_wake (cond->__data.__g_refs + g, INT_MAX, private);
 }
 
 /* Clean-up for cancellation of waiters waiting for normal signals.  We cancel
@@ -171,6 +151,15 @@ __condvar_cleanup_waiting (void *arg)
   pthread_cond_t *cond = cbuffer->cond;
   unsigned g = cbuffer->wseq & 1;
 
+  /* Normally we are not allowed to touch cond anymore after calling
+     __condvar_dec_grefs, because pthread_cond_destroy looks at __g_refs to
+     determine when all waiters have woken. Since we will do more work in
+     this function, we are using an extra channel to communicate to
+     pthread_cond_destroy that it is not allowed to finish yet: We
+     increment the refcount starting at the fourth bit on __wrefs. Relaxed
+     MO is enough. The synchronization happens because __condvar_dec_grefs
+     uses release MO. */
+  atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
   __condvar_dec_grefs (cond, g, cbuffer->private);
 
   __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer->private);
@@ -182,7 +171,12 @@ __condvar_cleanup_waiting (void *arg)
      conservatively.  */
   futex_wake (cond->__data.__g_signals + g, 1, cbuffer->private);
 
-  __condvar_confirm_wakeup (cond, cbuffer->private);
+  /* If destruction is pending (i.e., the wake-request flag is nonzero) and we
+     are the last waiter (prior value of __wrefs was 1 << 3), then wake any
+     threads waiting in pthread_cond_destroy.  Release MO to synchronize with
+     these threads.  Don't bother clearing the wake-up request flag.  */
+  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) == 3)
+    futex_wake (&cond->__data.__wrefs, INT_MAX, cbuffer->private);
 
   /* XXX If locking the mutex fails, should we just stop execution?  This
      might be better than silently ignoring the error.  */
@@ -286,20 +280,21 @@ __condvar_cleanup_waiting (void *arg)
    __g1_orig_size: Initial size of G1
      * The two least-significant bits represent the condvar-internal lock.
      * Only accessed while having acquired the condvar-internal lock.
-   __wrefs: Waiter reference counter.
+   __wrefs: Flags and count of waiters who called pthread_cancel.
      * Bit 2 is true if waiters should run futex_wake when they remove the
        last reference.  pthread_cond_destroy uses this as futex word.
      * Bit 1 is the clock ID (0 == CLOCK_REALTIME, 1 == CLOCK_MONOTONIC).
      * Bit 0 is true iff this is a process-shared condvar.
-     * Simple reference count used by both waiters and pthread_cond_destroy.
-     (If the format of __wrefs is changed, update nptl_lock_constants.pysym
-      and the pretty printers.)
+     * Simple reference count used by __condvar_cleanup_waiting and pthread_cond_destroy.
+     (If the format of __wrefs is changed, update the pretty printers.)
    For each of the two groups, we have:
    __g_refs: Futex waiter reference count.
      * LSB is true if waiters should run futex_wake when they remove the
        last reference.
      * Reference count used by waiters concurrently with signalers that have
        acquired the condvar-internal lock.
+     (If the format of __g_refs is changed, update nptl_lock_constants.pysym
+      and the pretty printers.)
    __g_signals: The number of signals that can still be consumed.
      * Used as a futex word by waiters.  Used concurrently by waiters and
        signalers.
@@ -328,18 +323,6 @@ __condvar_cleanup_waiting (void *arg)
    sufficient because if a waiter can see a sufficiently large value, it could
    have also consume a signal in the waiters group.
 
-   Waiters try to grab a signal from __g_signals without holding a reference
-   count, which can lead to stealing a signal from a more recent group after
-   their own group was already closed.  They cannot always detect whether they
-   in fact did because they do not know when they stole, but they can
-   conservatively add a signal back to the group they stole from; if they
-   did so unnecessarily, all that happens is a spurious wake-up.  To make this
-   even less likely, __g1_start contains the index of the current g2 too,
-   which allows waiters to check if there aliasing on the group slots; if
-   there wasn't, they didn't steal from the current G1, which means that the
-   G1 they stole from must have been already closed and they do not need to
-   fix anything.
-
    It is essential that the last field in pthread_cond_t is __g_signals[1]:
    The previous condvar used a pointer-sized field in pthread_cond_t, so a
    PTHREAD_COND_INITIALIZER from that condvar implementation might only
@@ -404,16 +387,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   unsigned int g = wseq & 1;
   uint64_t seq = wseq >> 1;
 
-  /* Increase the waiter reference count.  Relaxed MO is sufficient because
-     we only need to synchronize when decrementing the reference count.  */
-  unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
-  int private = __condvar_get_private (flags);
   /* Acquire a group reference and use acquire MO for that so that we
      synchronize with the dummy read-modify-write in
      __condvar_quiesce_and_switch_g1 if we read from the same group.  This will
      make us see the closed flag on __g_signals that designates a concurrent
      attempt to reuse the group's slot. */
   atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  int private = __condvar_get_private (flags);
 
   /* Now that we are registered as a waiter, we can release the mutex.
      Waiting on the condvar must be atomic with releasing the mutex, so if
@@ -427,7 +408,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
     {
       __condvar_dec_grefs (cond, g, private);
       __condvar_cancel_waiting (cond, seq, g, private);
-      __condvar_confirm_wakeup (cond, private);
       return err;
     }
 
@@ -479,8 +459,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	  /* No signals available after spinning, so prepare to block.
 	     First check the closed flag on __g_signals that designates a
 	     concurrent attempt to reuse the group's slot. We use acquire MO for
-	     the __g_signals check to make the __g1_start check work (see
-	     above).  */
+	     the __g_signals check to make sure we read the current value of
+	     __g1_start (see above).  */
 	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
 	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
 	    {
@@ -512,7 +492,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 		 the lock during cancellation is not possible.  */
 	      __condvar_cancel_waiting (cond, seq, g, private);
 	      result = err;
-	      goto confirm_wakeup;
+	      goto acquire_lock;
 	    }
 
 	  /* Reload signals.  See above for MO.  */
@@ -521,9 +501,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 
     }
   /* Try to grab a signal.  Use acquire MO so that we see an up-to-date value
-     of __g1_start below (see spinning above for a similar case).  In
-     particular, if we steal from a more recent group, we will also see a
-     more recent __g1_start below.  */
+     of __g1_start when spinning above.  */
   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
 						&signals, signals - 2));
 
@@ -533,9 +511,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      that before acquiring the mutex to allow for execution of
      pthread_cond_destroy while having acquired the mutex.  */
   __condvar_dec_grefs (cond, g, private);
-confirm_wakeup:
-  __condvar_confirm_wakeup (cond, private);
 
+acquire_lock:
   /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RESULT,
      which is set to ETIMEDOUT if a timeout occured, or zero otherwise.  */
   err = __pthread_mutex_cond_lock (mutex);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 4/6] nptl: Make test-cond-printers check the number of waiters
  2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
                   ` (2 preceding siblings ...)
  2022-10-06 21:43 ` [PATCH v3 3/6] nptl: Optimization by not incrementing wrefs in pthread_cond_wait malteskarupke
@ 2022-10-06 21:43 ` malteskarupke
  2022-10-06 21:43 ` [PATCH v3 5/6] nptl: Rename __wrefs to __crefs because its meaning has changed malteskarupke
  2022-10-06 21:43 ` [PATCH v3 6/6] nptl: Cleaning up __g1_start and related code in pthread_cond_wait malteskarupke
  5 siblings, 0 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Malte Skarupke

From: Malte Skarupke <malteskarupke@fastmail.fm>

In my last change I changed the semantics of how to determine the
number of waiters on a condition variable. The existing test only
tested that the printers print something. They didn't cover the case
when there is a thread sleeping on the condition variable. In this
patch I changed the test to ensure that the correct number is printed.
---
 nptl/test-cond-printers.c  | 56 +++++++++++++++++++++++++++++++++-----
 nptl/test-cond-printers.py |  5 ++++
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/nptl/test-cond-printers.c b/nptl/test-cond-printers.c
index 51d7f920b3..615fd5a681 100644
--- a/nptl/test-cond-printers.c
+++ b/nptl/test-cond-printers.c
@@ -26,7 +26,14 @@
 #define PASS 0
 #define FAIL 1
 
-static int test_status_destroyed (pthread_cond_t *condvar);
+static int test_status (pthread_cond_t *condvar);
+
+typedef struct
+{
+  pthread_mutex_t *mutex;
+  pthread_cond_t *condvar;
+  int *wait_thread_asleep;
+} test_state;
 
 int
 main (void)
@@ -36,22 +43,57 @@ main (void)
   int result = FAIL;
 
   if (pthread_condattr_init (&attr) == 0
-      && test_status_destroyed (&condvar) == PASS)
+      && test_status (&condvar) == PASS)
     result = PASS;
   /* Else, one of the pthread_cond* functions failed.  */
 
   return result;
 }
 
+static void *
+wait (void *arg)
+{
+  test_state *state = (test_state *)arg;
+  void *result = PASS;
+  if (pthread_mutex_lock (state->mutex) != 0)
+    result = (void *)FAIL;
+  *state->wait_thread_asleep = 1;
+  if (pthread_cond_signal (state->condvar) != 0)
+    result = (void *)FAIL;
+  if (pthread_cond_wait (state->condvar, state->mutex) != 0)
+    result = (void *)FAIL;
+  if (pthread_mutex_unlock (state->mutex) != 0)
+    result = (void *)FAIL;
+  return result;
+}
+
 /* Initializes CONDVAR, then destroys it.  */
 static int
-test_status_destroyed (pthread_cond_t *condvar)
+test_status (pthread_cond_t *condvar)
 {
-  int result = FAIL;
+  int result = PASS;
 
-  if (pthread_cond_init (condvar, NULL) == 0
-      && pthread_cond_destroy (condvar) == 0)
-    result = PASS; /* Test status (destroyed).  */
+  pthread_mutex_t mutex;
+  result |= pthread_mutex_init (&mutex, NULL);
+  result |= pthread_cond_init (condvar, NULL);
+  int wait_thread_asleep = 0;
+  test_state state = { &mutex, condvar, &wait_thread_asleep };
+  result |= pthread_mutex_lock (&mutex);
+  pthread_t thread;
+  result |= pthread_create (&thread, NULL, wait, &state);
+  while (!wait_thread_asleep)
+    {
+      result |= pthread_cond_wait (condvar, &mutex);
+    }
+  result |= pthread_cond_signal (condvar); /* Test about to signal */
+  result |= pthread_mutex_unlock (&mutex);
+  result |= pthread_cond_destroy (condvar);
+  void *retval = NULL;
+  result |= pthread_join (thread, &retval);  /* Test status (destroyed).  */
+  result |= pthread_mutex_destroy (&mutex);
+  result = result ? FAIL : PASS;
+  if (retval != NULL)
+    result = FAIL;
 
   return result;
 }
diff --git a/nptl/test-cond-printers.py b/nptl/test-cond-printers.py
index 42329c1691..7945c7a0d5 100644
--- a/nptl/test-cond-printers.py
+++ b/nptl/test-cond-printers.py
@@ -33,6 +33,11 @@ try:
     var = 'condvar'
     to_string = 'pthread_cond_t'
 
+    break_at(test_source, 'Test about to signal')
+    continue_cmd() # Go to test_status_destroyed
+    test_printer(var, to_string, {'Threads known to still execute a wait function': '1'})
+
+
     break_at(test_source, 'Test status (destroyed)')
     continue_cmd() # Go to test_status_destroyed
     test_printer(var, to_string, {'Threads known to still execute a wait function': '0'})
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 5/6] nptl: Rename __wrefs to __crefs because its meaning has changed
  2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
                   ` (3 preceding siblings ...)
  2022-10-06 21:43 ` [PATCH v3 4/6] nptl: Make test-cond-printers check the number of waiters malteskarupke
@ 2022-10-06 21:43 ` malteskarupke
  2022-10-06 21:43 ` [PATCH v3 6/6] nptl: Cleaning up __g1_start and related code in pthread_cond_wait malteskarupke
  5 siblings, 0 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Malte Skarupke

From: Malte Skarupke <malteskarupke@fastmail.fm>

When I remove the increment/decrement of wrefs in pthread_cond_wait,
it no longer really had the meaning of representing the number of
waiters. It is still used as a reference count for threads that call
pthread_cancel, so crefs it is.
---
 nptl/nptl-printers.py                   |  6 +++---
 nptl/pthread_cond_broadcast.c           |  2 +-
 nptl/pthread_cond_common.c              |  4 ++--
 nptl/pthread_cond_destroy.c             | 10 +++++-----
 nptl/pthread_cond_init.c                |  4 ++--
 nptl/pthread_cond_signal.c              |  2 +-
 nptl/pthread_cond_wait.c                | 16 ++++++++--------
 nptl/tst-cond22.c                       |  4 ++--
 sysdeps/nptl/bits/thread-shared-types.h |  2 +-
 9 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/nptl/nptl-printers.py b/nptl/nptl-printers.py
index 3fb0335135..981eeedd76 100644
--- a/nptl/nptl-printers.py
+++ b/nptl/nptl-printers.py
@@ -312,7 +312,7 @@ class ConditionVariablePrinter(object):
         """
 
         data = cond['__data']
-        self.wrefs = data['__wrefs']
+        self.crefs = data['__crefs']
         self.grefs = data['__g_refs']
         self.values = []
 
@@ -359,12 +359,12 @@ class ConditionVariablePrinter(object):
     def read_attributes(self):
         """Read the condvar's attributes."""
 
-        if (self.wrefs & PTHREAD_COND_CLOCK_MONOTONIC_MASK) != 0:
+        if (self.crefs & PTHREAD_COND_CLOCK_MONOTONIC_MASK) != 0:
             self.values.append(('Clock ID', 'CLOCK_MONOTONIC'))
         else:
             self.values.append(('Clock ID', 'CLOCK_REALTIME'))
 
-        if (self.wrefs & PTHREAD_COND_SHARED_MASK) != 0:
+        if (self.crefs & PTHREAD_COND_SHARED_MASK) != 0:
             self.values.append(('Shared', 'Yes'))
         else:
             self.values.append(('Shared', 'No'))
diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c
index e45f6271bf..18a3c3553c 100644
--- a/nptl/pthread_cond_broadcast.c
+++ b/nptl/pthread_cond_broadcast.c
@@ -44,7 +44,7 @@ ___pthread_cond_broadcast (pthread_cond_t *cond)
   unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
   if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__crefs);
   int private = __condvar_get_private (flags);
 
   __condvar_acquire_lock (cond, private);
diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
index ce09d5d15a..220db22e9f 100644
--- a/nptl/pthread_cond_common.c
+++ b/nptl/pthread_cond_common.c
@@ -21,7 +21,7 @@
 #include <stdint.h>
 #include <pthread.h>
 
-/* We need 3 least-significant bits on __wrefs for something else.
+/* We need 3 least-significant bits on __crefs for something else.
    This also matches __atomic_wide_counter requirements: The highest
    value we add is __PTHREAD_COND_MAX_GROUP_SIZE << 2 to __g1_start
    (the two extra bits are for the lock in the two LSBs of
@@ -178,7 +178,7 @@ __condvar_set_orig_size (pthread_cond_t *cond, unsigned int size)
     atomic_store_relaxed (&cond->__data.__g1_orig_size, (size << 2) | 2);
 }
 
-/* Returns FUTEX_SHARED or FUTEX_PRIVATE based on the provided __wrefs
+/* Returns FUTEX_SHARED or FUTEX_PRIVATE based on the provided __crefs
    value.  */
 static int __attribute__ ((unused))
 __condvar_get_private (int flags)
diff --git a/nptl/pthread_cond_destroy.c b/nptl/pthread_cond_destroy.c
index 053d0a2fbc..844647b3b7 100644
--- a/nptl/pthread_cond_destroy.c
+++ b/nptl/pthread_cond_destroy.c
@@ -42,7 +42,7 @@ __pthread_cond_destroy (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_destroy, 1, cond);
 
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__crefs);
   int private = __condvar_get_private (flags);
   for (unsigned g = 0; g < 2; ++g)
     {
@@ -62,11 +62,11 @@ __pthread_cond_destroy (pthread_cond_t *cond)
 
   /* Same as above, except to synchronize with canceled threads.  This wake
      flag never gets cleared, so it's enough to set it once.  */
-  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4) | 4;
-  while (wrefs >> 3 != 0)
+  unsigned int crefs = atomic_fetch_or_acquire (&cond->__data.__crefs, 4) | 4;
+  while (crefs >> 3 != 0)
     {
-      futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
-      wrefs = atomic_load_acquire (&cond->__data.__wrefs);
+      futex_wait_simple (&cond->__data.__crefs, crefs, private);
+      crefs = atomic_load_acquire (&cond->__data.__crefs);
     }
   /* The memory the condvar occupies can now be reused.  */
   return 0;
diff --git a/nptl/pthread_cond_init.c b/nptl/pthread_cond_init.c
index 739b3afb2d..d2e1988cf6 100644
--- a/nptl/pthread_cond_init.c
+++ b/nptl/pthread_cond_init.c
@@ -36,13 +36,13 @@ __pthread_cond_init (pthread_cond_t *cond, const pthread_condattr_t *cond_attr)
 
   /* Iff not equal to ~0l, this is a PTHREAD_PROCESS_PRIVATE condvar.  */
   if (icond_attr != NULL && (icond_attr->value & 1) != 0)
-    cond->__data.__wrefs |= __PTHREAD_COND_SHARED_MASK;
+    cond->__data.__crefs |= __PTHREAD_COND_SHARED_MASK;
   int clockid = (icond_attr != NULL
 		 ? ((icond_attr->value >> 1) & ((1 << COND_CLOCK_BITS) - 1))
 		 : CLOCK_REALTIME);
   /* If 0, CLOCK_REALTIME is used; CLOCK_MONOTONIC otherwise.  */
   if (clockid != CLOCK_REALTIME)
-    cond->__data.__wrefs |= __PTHREAD_COND_CLOCK_MONOTONIC_MASK;
+    cond->__data.__crefs |= __PTHREAD_COND_CLOCK_MONOTONIC_MASK;
 
   LIBC_PROBE (cond_init, 2, cond, cond_attr);
 
diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c
index 2e8be2d3b5..d3735103fc 100644
--- a/nptl/pthread_cond_signal.c
+++ b/nptl/pthread_cond_signal.c
@@ -48,7 +48,7 @@ ___pthread_cond_signal (pthread_cond_t *cond)
   unsigned int grefs1 = atomic_load_relaxed (cond->__data.__g_refs + 1);
   if ((grefs0 >> 1) == 0 && (grefs1 >> 1) == 0)
     return 0;
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__crefs);
   int private = __condvar_get_private (flags);
 
   __condvar_acquire_lock (cond, private);
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index b949805dae..515b4ba60e 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -156,10 +156,10 @@ __condvar_cleanup_waiting (void *arg)
      determine when all waiters have woken. Since we will do more work in
      this function, we are using an extra channel to communicate to
      pthread_cond_destroy that it is not allowed to finish yet: We
-     increment the refcount starting at the fourth bit on __wrefs. Relaxed
+     increment the refcount starting at the fourth bit on __crefs. Relaxed
      MO is enough. The synchronization happens because __condvar_dec_grefs
      uses release MO. */
-  atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
+  atomic_fetch_add_relaxed (&cond->__data.__crefs, 8);
   __condvar_dec_grefs (cond, g, cbuffer->private);
 
   __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer->private);
@@ -175,8 +175,8 @@ __condvar_cleanup_waiting (void *arg)
      are the last waiter (prior value of __wrefs was 1 << 3), then wake any
      threads waiting in pthread_cond_destroy.  Release MO to synchronize with
      these threads.  Don't bother clearing the wake-up request flag.  */
-  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) == 3)
-    futex_wake (&cond->__data.__wrefs, INT_MAX, cbuffer->private);
+  if ((atomic_fetch_add_release (&cond->__data.__crefs, -8) >> 2) == 3)
+    futex_wake (&cond->__data.__crefs, INT_MAX, cbuffer->private);
 
   /* XXX If locking the mutex fails, should we just stop execution?  This
      might be better than silently ignoring the error.  */
@@ -280,13 +280,13 @@ __condvar_cleanup_waiting (void *arg)
    __g1_orig_size: Initial size of G1
      * The two least-significant bits represent the condvar-internal lock.
      * Only accessed while having acquired the condvar-internal lock.
-   __wrefs: Flags and count of waiters who called pthread_cancel.
+   __crefs: Flags and count of waiters who called pthread_cancel.
      * Bit 2 is true if waiters should run futex_wake when they remove the
        last reference.  pthread_cond_destroy uses this as futex word.
      * Bit 1 is the clock ID (0 == CLOCK_REALTIME, 1 == CLOCK_MONOTONIC).
      * Bit 0 is true iff this is a process-shared condvar.
      * Simple reference count used by __condvar_cleanup_waiting and pthread_cond_destroy.
-     (If the format of __wrefs is changed, update the pretty printers.)
+     (If the format of __crefs is changed, update the pretty printers.)
    For each of the two groups, we have:
    __g_refs: Futex waiter reference count.
      * LSB is true if waiters should run futex_wake when they remove the
@@ -393,7 +393,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
      make us see the closed flag on __g_signals that designates a concurrent
      attempt to reuse the group's slot. */
   atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__crefs);
   int private = __condvar_get_private (flags);
 
   /* Now that we are registered as a waiter, we can release the mutex.
@@ -548,7 +548,7 @@ ___pthread_cond_timedwait64 (pthread_cond_t *cond, pthread_mutex_t *mutex,
 
   /* Relaxed MO is suffice because clock ID bit is only modified
      in condition creation.  */
-  unsigned int flags = atomic_load_relaxed (&cond->__data.__wrefs);
+  unsigned int flags = atomic_load_relaxed (&cond->__data.__crefs);
   clockid_t clockid = (flags & __PTHREAD_COND_CLOCK_MONOTONIC_MASK)
                     ? CLOCK_MONOTONIC : CLOCK_REALTIME;
   return __pthread_cond_wait_common (cond, mutex, clockid, abstime);
diff --git a/nptl/tst-cond22.c b/nptl/tst-cond22.c
index 1336e9c79d..9f8cfea5c3 100644
--- a/nptl/tst-cond22.c
+++ b/nptl/tst-cond22.c
@@ -113,7 +113,7 @@ do_test (void)
 	  c.__data.__g1_start.__value32.__low,
 	  c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0],
 	  c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1],
-	  c.__data.__g1_orig_size, c.__data.__wrefs);
+	  c.__data.__g1_orig_size, c.__data.__crefs);
 
   if (pthread_create (&th, NULL, tf, (void *) 1l) != 0)
     {
@@ -159,7 +159,7 @@ do_test (void)
 	  c.__data.__g1_start.__value32.__low,
 	  c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0],
 	  c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1],
-	  c.__data.__g1_orig_size, c.__data.__wrefs);
+	  c.__data.__g1_orig_size, c.__data.__crefs);
 
   return status;
 }
diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h
index 5653507e55..52decc49d6 100644
--- a/sysdeps/nptl/bits/thread-shared-types.h
+++ b/sysdeps/nptl/bits/thread-shared-types.h
@@ -98,7 +98,7 @@ struct __pthread_cond_s
   unsigned int __g_refs[2] __LOCK_ALIGNMENT;
   unsigned int __g_size[2];
   unsigned int __g1_orig_size;
-  unsigned int __wrefs;
+  unsigned int __crefs;
   unsigned int __g_signals[2];
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 6/6] nptl: Cleaning up __g1_start and related code in pthread_cond_wait
  2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
                   ` (4 preceding siblings ...)
  2022-10-06 21:43 ` [PATCH v3 5/6] nptl: Rename __wrefs to __crefs because its meaning has changed malteskarupke
@ 2022-10-06 21:43 ` malteskarupke
  5 siblings, 0 replies; 7+ messages in thread
From: malteskarupke @ 2022-10-06 21:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Malte Skarupke

From: Malte Skarupke <malteskarupke@fastmail.fm>

After my previous changes, __g1_start can be simpler. It can not change
while a thread is in pthread_cond_wait, because groups are only allowed
to switch when all threads in g1 have left that function. So there is no
need to check if it has changed in pthread_cond_wait.

After __g1_start was no longer read in pthread_cond_wait, it was only
read in code that holds the internal lock of the condition variable. So
there was no longer a need for __g1_start to be atomic.

Finally, the low bit of __g1_start was only used in the block that's was
supposed to handle potential stealing of signals. Since I deleted that
block, we can stop shifting the count in __g1_start.
---
 nptl/pthread_cond_common.c              | 40 +++++-----------------
 nptl/pthread_cond_wait.c                | 45 +++++--------------------
 nptl/tst-cond22.c                       | 10 +++---
 sysdeps/nptl/bits/thread-shared-types.h |  2 +-
 sysdeps/nptl/pthread.h                  |  2 +-
 5 files changed, 23 insertions(+), 76 deletions(-)

diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
index 220db22e9f..9f3e71fd69 100644
--- a/nptl/pthread_cond_common.c
+++ b/nptl/pthread_cond_common.c
@@ -21,11 +21,7 @@
 #include <stdint.h>
 #include <pthread.h>
 
-/* We need 3 least-significant bits on __crefs for something else.
-   This also matches __atomic_wide_counter requirements: The highest
-   value we add is __PTHREAD_COND_MAX_GROUP_SIZE << 2 to __g1_start
-   (the two extra bits are for the lock in the two LSBs of
-   __g1_start).  */
+/* We need 3 least-significant bits on __crefs for something else. */
 #define __PTHREAD_COND_MAX_GROUP_SIZE ((unsigned) 1 << 29)
 
 static inline uint64_t
@@ -40,18 +36,6 @@ __condvar_fetch_add_wseq_acquire (pthread_cond_t *cond, unsigned int val)
   return __atomic_wide_counter_fetch_add_acquire (&cond->__data.__wseq, val);
 }
 
-static inline uint64_t
-__condvar_load_g1_start_relaxed (pthread_cond_t *cond)
-{
-  return __atomic_wide_counter_load_relaxed (&cond->__data.__g1_start);
-}
-
-static inline void
-__condvar_add_g1_start_relaxed (pthread_cond_t *cond, unsigned int val)
-{
-  __atomic_wide_counter_add_relaxed (&cond->__data.__g1_start, val);
-}
-
 #if __HAVE_64B_ATOMICS == 1
 
 static inline uint64_t
@@ -210,7 +194,7 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
      behavior.
      Note that this works correctly for a zero-initialized condvar too.  */
   unsigned int old_orig_size = __condvar_get_orig_size (cond);
-  uint64_t old_g1_start = __condvar_load_g1_start_relaxed (cond) >> 1;
+  uint64_t old_g1_start = cond->__data.__g1_start;
   if (((unsigned) (wseq - old_g1_start - old_orig_size)
 	  + cond->__data.__g_size[g1 ^ 1]) == 0)
 	return false;
@@ -240,10 +224,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
   /* Wait until there are no group references anymore.  The fetch-or operation
      injects us into the modification order of __g_refs; release MO ensures
      that waiters incrementing __g_refs after our fetch-or see the previous
-     changes to __g_signals and to __g1_start that had to happen before we can
-     switch this G1 and alias with an older group (we have two groups, so
-     aliasing requires switching group roles twice).  Note that nobody else
-     can have set the wake-request flag, so we do not have to act upon it.
+     change to __g_signals that had to happen before we can switch this G1
+     and alias with an older group (we have two groups, so aliasing requires
+     switching group roles twice).  Note that nobody else can have set the
+     wake-request flag, so we do not have to act upon it.
 
      Also note that it is harmless if older waiters or waiters from this G1
      get a group reference after we have quiesced the group because it will
@@ -281,15 +265,9 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq,
      after the waiters we waited for.  */
   atomic_fetch_and_acquire (cond->__data.__g_refs + g1, ~(unsigned int)1);
 
-  /* Update __g1_start, which finishes closing this group.  The value we add
-     will never be negative because old_orig_size can only be zero when we
-     switch groups the first time after a condvar was initialized, in which
-     case G1 will be at index 1 and we will add a value of 1.  See above for
-     why this takes place after waiting for quiescence of the group.
-     Relaxed MO is fine because the change comes with no additional
-     constraints that others would have to observe.  */
-  __condvar_add_g1_start_relaxed (cond,
-      (old_orig_size << 1) + (g1 == 1 ? 1 : - 1));
+  /* Update __g1_start, which finishes closing this group.  See above for
+     why this takes place after waiting for quiescence of the group.  */
+  cond->__data.__g1_start += old_orig_size;
 
   /* Now reopen the group, thus enabling waiters to again block using the
      futex controlled by __g_signals.  Release MO so that observers that see
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index 515b4ba60e..7fb3dfb5ac 100644
--- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -71,7 +71,7 @@ __condvar_cancel_waiting (pthread_cond_t *cond, uint64_t seq, unsigned int g,
      not hold a reference on the group.  */
   __condvar_acquire_lock (cond, private);
 
-  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond) >> 1;
+  uint64_t g1_start = cond->__data.__g1_start;
   if (g1_start > seq)
     {
       /* Our group is closed, so someone provided enough signals for it.
@@ -274,9 +274,8 @@ __condvar_cleanup_waiting (void *arg)
      * Waiters fetch-add while having acquire the mutex associated with the
        condvar.  Signalers load it and fetch-xor it concurrently.
    __g1_start: Starting position of G1 (inclusive)
-     * LSB is index of current G2.
-     * Modified by signalers while having acquired the condvar-internal lock
-       and observed concurrently by waiters.
+     * Modified by signalers  and observed by waiters, both only while having
+       acquired the condvar-internal lock.
    __g1_orig_size: Initial size of G1
      * The two least-significant bits represent the condvar-internal lock.
      * Only accessed while having acquired the condvar-internal lock.
@@ -313,16 +312,6 @@ __condvar_cleanup_waiting (void *arg)
    A PTHREAD_COND_INITIALIZER condvar has all fields set to zero, which yields
    a condvar that has G2 starting at position 0 and a G1 that is closed.
 
-   Because waiters do not claim ownership of a group right when obtaining a
-   position in __wseq but only reference count the group when using futexes
-   to block, it can happen that a group gets closed before a waiter can
-   increment the reference count.  Therefore, waiters have to check whether
-   their group is already closed using __g1_start.  They also have to perform
-   this check when spinning when trying to grab a signal from __g_signals.
-   Note that for these checks, using relaxed MO to load __g1_start is
-   sufficient because if a waiter can see a sufficiently large value, it could
-   have also consume a signal in the waiters group.
-
    It is essential that the last field in pthread_cond_t is __g_signals[1]:
    The previous condvar used a pointer-sized field in pthread_cond_t, so a
    PTHREAD_COND_INITIALIZER from that condvar implementation might only
@@ -414,8 +403,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
   /* Now wait until a signal is available in our group or it is closed.
      Acquire MO so that if we observe a value of zero written after group
      switching in __condvar_quiesce_and_switch_g1, we synchronize with that
-     store and will see the prior update of __g1_start done while switching
-     groups too.  */
+     store.  */
   unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g);
 
   do
@@ -435,11 +423,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	  unsigned int spin = maxspin;
 	  while (signals == 0 && spin > 0)
 	    {
-	      /* Check that we are not spinning on a group that's already
-		 closed.  */
-	      if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
-		goto done;
-
 	      /* TODO Back off.  */
 
 	      /* Reload signals.  See above for MO.  */
@@ -456,19 +439,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	  if (signals != 0)
 	    break;
 
-	  /* No signals available after spinning, so prepare to block.
-	     First check the closed flag on __g_signals that designates a
-	     concurrent attempt to reuse the group's slot. We use acquire MO for
-	     the __g_signals check to make sure we read the current value of
-	     __g1_start (see above).  */
-	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
-	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
-	    {
-	      /* Our group is closed.  */
-	      goto done;
-	    }
-
-	  // Now block.
+	  // No signals available after spinning, so block.
 	  struct _pthread_cleanup_buffer buffer;
 	  struct _condvar_cleanup_buffer cbuffer;
 	  cbuffer.wseq = wseq;
@@ -500,9 +471,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
 	}
 
     }
-  /* Try to grab a signal.  Use acquire MO so that we see an up-to-date value
-     of __g1_start when spinning above.  */
-  while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
+  /* Try to grab a signal.  Relaxed MO is enough because the group can't be
+     closed while we're in this loop, so there are no writes we could miss.  */
+  while (!atomic_compare_exchange_weak_relaxed (cond->__data.__g_signals + g,
 						&signals, signals - 2));
 
  done:
diff --git a/nptl/tst-cond22.c b/nptl/tst-cond22.c
index 9f8cfea5c3..f5ee62e639 100644
--- a/nptl/tst-cond22.c
+++ b/nptl/tst-cond22.c
@@ -106,11 +106,10 @@ do_test (void)
       status = 1;
     }
 
-  printf ("cond = { 0x%x:%x, 0x%x:%x, %u/%u/%u, %u/%u/%u, %u, %u }\n",
+  printf ("cond = { 0x%x:%x, %llu, %u/%u/%u, %u/%u/%u, %u, %u }\n",
 	  c.__data.__wseq.__value32.__high,
 	  c.__data.__wseq.__value32.__low,
-	  c.__data.__g1_start.__value32.__high,
-	  c.__data.__g1_start.__value32.__low,
+	  c.__data.__g1_start,
 	  c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0],
 	  c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1],
 	  c.__data.__g1_orig_size, c.__data.__crefs);
@@ -152,11 +151,10 @@ do_test (void)
       status = 1;
     }
 
-  printf ("cond = { 0x%x:%x, 0x%x:%x, %u/%u/%u, %u/%u/%u, %u, %u }\n",
+  printf ("cond = { 0x%x:%x, %llu, %u/%u/%u, %u/%u/%u, %u, %u }\n",
 	  c.__data.__wseq.__value32.__high,
 	  c.__data.__wseq.__value32.__low,
-	  c.__data.__g1_start.__value32.__high,
-	  c.__data.__g1_start.__value32.__low,
+	  c.__data.__g1_start,
 	  c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0],
 	  c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1],
 	  c.__data.__g1_orig_size, c.__data.__crefs);
diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h
index 52decc49d6..52234b1512 100644
--- a/sysdeps/nptl/bits/thread-shared-types.h
+++ b/sysdeps/nptl/bits/thread-shared-types.h
@@ -94,7 +94,7 @@ typedef struct __pthread_internal_slist
 struct __pthread_cond_s
 {
   __atomic_wide_counter __wseq;
-  __atomic_wide_counter __g1_start;
+  unsigned long long int __g1_start;
   unsigned int __g_refs[2] __LOCK_ALIGNMENT;
   unsigned int __g_size[2];
   unsigned int __g1_orig_size;
diff --git a/sysdeps/nptl/pthread.h b/sysdeps/nptl/pthread.h
index dedad4ec86..b0ddbe2ee3 100644
--- a/sysdeps/nptl/pthread.h
+++ b/sysdeps/nptl/pthread.h
@@ -152,7 +152,7 @@ enum
 
 
 /* Conditional variable handling.  */
-#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, {0, 0}, 0, 0, {0, 0} } }
+#define PTHREAD_COND_INITIALIZER { { {0}, 0, {0, 0}, {0, 0}, 0, 0, {0, 0} } }
 
 
 /* Cleanup buffers */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-06 21:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06 21:43 [PATCH v3 0/6] nptl: Fix pthread_cond_signal missing a sleeper malteskarupke
2022-10-06 21:43 ` [PATCH v3 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) malteskarupke
2022-10-06 21:43 ` [PATCH v3 2/6] nptl: Remove the signal-stealing code. It is no longer needed malteskarupke
2022-10-06 21:43 ` [PATCH v3 3/6] nptl: Optimization by not incrementing wrefs in pthread_cond_wait malteskarupke
2022-10-06 21:43 ` [PATCH v3 4/6] nptl: Make test-cond-printers check the number of waiters malteskarupke
2022-10-06 21:43 ` [PATCH v3 5/6] nptl: Rename __wrefs to __crefs because its meaning has changed malteskarupke
2022-10-06 21:43 ` [PATCH v3 6/6] nptl: Cleaning up __g1_start and related code in pthread_cond_wait malteskarupke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).