From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by sourceware.org (Postfix) with ESMTPS id 86444394CC2E for ; Thu, 6 Oct 2022 21:43:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 86444394CC2E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=fastmail.fm Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=fastmail.fm Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 1E0815C00EE; Thu, 6 Oct 2022 17:43:51 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Thu, 06 Oct 2022 17:43:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1665092631; x=1665179031; bh=98 RcN0OR4NtLR1UUZNuGxNenu7nTgfKyj/BMsu3SCoc=; b=ZDDEsSGrTHOVWHKIj0 CJF54wLO99iANPdvF/RyRS00HLzo/YkwFNUcBXCryuKEDL5YCDIOtd6JYPz6Syez A/uU/et4qxRiInExpRqM9681IHI/2tHy595kcGfwxnMPZiNnI9EaOdliW6OgMyul 8Zq9Tcbq/8yFQYJKp7603zQ3OLP8bnpXAmyZfNGMH4jU3y/FHlyEAqR+yhJjCIp9 G9t5hoBXh9XjQKPRViA1VTp8pbndY0DbQjhah2/G0xs+sp1mAZCzgky2ipnTtPCG p7SNzX6zOEkFhSofrobk6jssSAtJcYbn4Xe4kYQjN5Tp6FqnLenMS2Bdeg8oSuyy Yfhw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1665092631; x=1665179031; bh=98RcN0OR4NtLR 1UUZNuGxNenu7nTgfKyj/BMsu3SCoc=; b=zGBg3Nv/hv9x+Y0TVF49OzkoaNfCE 8mYqmOBkHJe+cFGOqrLutomHvJaOV3r8/2WHHctrk3LDD7iRG8A45S1OfEzgwZLJ TYowfelmgNPrYZeV2A+tcDViWh3R0uGfOeSvQ3DiBxjlVIg2jJJsKNtFNWMzaFTc SL9MmqXogUADpIkQR0aP+eCpdl/ycyA2bPLWpYQo6OdxJuZz/6x4yqI/cZ/Uw6pC mffyj2pcvgplO3ZHESUVgOMqZrNm+3JAD2Grr/dz0u9bNQGeyPBhWfinV8scp3q2 SYo8SFh75bCzgKfVBcLypExnF70bZL43PS83Do2qZ86akPfQWBfI5relw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeeiiedgtdduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmhgrlhht vghskhgrrhhuphhkvgesfhgrshhtmhgrihhlrdhfmhenucggtffrrghtthgvrhhnpedufe eufeejffehjeeiiedtudfgveffgefgteeludeutdffffetvdfgudfhleelveenucffohhm rghinhepphhrohgsrggslhihuggrnhgtvgdrtghomhenucevlhhushhtvghrufhiiigvpe dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmrghlthgvshhkrghruhhpkhgvsehfrghs thhmrghilhdrfhhm X-ME-Proxy: Feedback-ID: ifa6c408f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 6 Oct 2022 17:43:50 -0400 (EDT) From: malteskarupke@fastmail.fm To: libc-alpha@sourceware.org Cc: Malte Skarupke Subject: [PATCH v3 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Date: Thu, 6 Oct 2022 17:43:24 -0400 Message-Id: <20221006214329.1084244-2-malteskarupke@fastmail.fm> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221006214329.1084244-1-malteskarupke@fastmail.fm> References: <20221006214329.1084244-1-malteskarupke@fastmail.fm> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Malte Skarupke There was a rare bug in pthread_cond_wait's handling of the case when a signal was stolen because a waiter took a long time to leave pthread_cond_wait. I wrote about the bug here: https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/ and here: https://probablydance.com/2022/09/17/finding-the-second-bug-in-glibcs-condition-variable/ The bug was subtle and only happened in an edge-case of an edge-case so rather than fixing it, I decided to remove the outer edge-case: By broadening the scope of grefs, stealing of signals becomes impossible. A signaling thread will always wait for all waiters to leave pthread_cond_wait before closing a group, so now no waiter from the past can come back and steal a signal from a future group. This is patch 1/6, it contains the minimal amount of changes necessary to fix the bug. This leads to an unnecessary amount of atomic operations, but the other patches in this series will undo most of that damage. --- nptl/pthread_cond_wait.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c index 20c348a503..7b9116c930 100644 --- a/nptl/pthread_cond_wait.c +++ b/nptl/pthread_cond_wait.c @@ -408,6 +408,12 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, we only need to synchronize when decrementing the reference count. */ unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8); int private = __condvar_get_private (flags); + /* Acquire a group reference and use acquire MO for that so that we + synchronize with the dummy read-modify-write in + __condvar_quiesce_and_switch_g1 if we read from the same group. This will + make us see the closed flag on __g_signals that designates a concurrent + attempt to reuse the group's slot. */ + atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); /* Now that we are registered as a waiter, we can release the mutex. Waiting on the condvar must be atomic with releasing the mutex, so if @@ -419,6 +425,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, err = __pthread_mutex_unlock_usercnt (mutex, 0); if (__glibc_unlikely (err != 0)) { + __condvar_dec_grefs (cond, g, private); __condvar_cancel_waiting (cond, seq, g, private); __condvar_confirm_wakeup (cond, private); return err; @@ -470,24 +477,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, break; /* No signals available after spinning, so prepare to block. - We first acquire a group reference and use acquire MO for that so - that we synchronize with the dummy read-modify-write in - __condvar_quiesce_and_switch_g1 if we read from that. In turn, - in this case this will make us see the closed flag on __g_signals - that designates a concurrent attempt to reuse the group's slot. - We use acquire MO for the __g_signals check to make the - __g1_start check work (see spinning above). - Note that the group reference acquisition will not mask the - release MO when decrementing the reference count because we use - an atomic read-modify-write operation and thus extend the release - sequence. */ - atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); + First check the closed flag on __g_signals that designates a + concurrent attempt to reuse the group's slot. We use acquire MO for + the __g_signals check to make the __g1_start check work (see + above). */ if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0) || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))) { - /* Our group is closed. Wake up any signalers that might be - waiting. */ - __condvar_dec_grefs (cond, g, private); + /* Our group is closed. */ goto done; } @@ -515,10 +512,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, the lock during cancellation is not possible. */ __condvar_cancel_waiting (cond, seq, g, private); result = err; - goto done; + goto confirm_wakeup; } - else - __condvar_dec_grefs (cond, g, private); /* Reload signals. See above for MO. */ signals = atomic_load_acquire (cond->__data.__g_signals + g); @@ -597,9 +592,11 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, done: - /* Confirm that we have been woken. We do that before acquiring the mutex - to allow for execution of pthread_cond_destroy while having acquired the - mutex. */ + /* Decrement group reference count and confirm that we have been woken. We do + that before acquiring the mutex to allow for execution of + pthread_cond_destroy while having acquired the mutex. */ + __condvar_dec_grefs (cond, g, private); +confirm_wakeup: __condvar_confirm_wakeup (cond, private); /* Woken up; now re-acquire the mutex. If this doesn't fail, return RESULT, -- 2.25.1