From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <triegel@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [63.128.21.124])
 by sourceware.org (Postfix) with ESMTP id F13F8386F410
 for <libc-alpha@sourceware.org>; Mon, 18 Jan 2021 22:43:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F13F8386F410
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-312--EAm7YA4NAugYTDIWYjsww-1; Mon, 18 Jan 2021 17:43:11 -0500
X-MC-Unique: -EAm7YA4NAugYTDIWYjsww-1
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com
 [10.5.11.11])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 223AB801817;
 Mon, 18 Jan 2021 22:43:10 +0000 (UTC)
Received: from ovpn-115-172.ams2.redhat.com (ovpn-115-172.ams2.redhat.com
 [10.36.115.172])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id D908E19C93;
 Mon, 18 Jan 2021 22:43:08 +0000 (UTC)
Message-ID: <52eb6c8ce3fcd0cbc00667a946cf7191a68dcfd1.camel@redhat.com>
Subject: Re: [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper
 (#BZ 25847)
From: Torvald Riegel <triegel@redhat.com>
To: Malte Skarupke <malteskarupke@web.de>, libc-alpha@sourceware.org
Cc: malteskarupke@fastmail.fm
Date: Mon, 18 Jan 2021 23:43:06 +0100
In-Reply-To: <20210116204950.16434-1-malteskarupke@web.de>
References: <20210116204950.16434-1-malteskarupke@web.de>
Mime-Version: 1.0
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-14.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jan 2021 22:43:14 -0000

On Sat, 2021-01-16 at 15:49 -0500, Malte Skarupke wrote:
> This change is the minimal amount of changes necessary to fix the
> bug.
> This leads to slightly slower performance, but the next two patches
> in this series will undo most of that damage.

Is this based on experiments, or an assumption based on reasoning?

Which kinds of workloads are you talking about (e.g., high vs. low
contention, ...).

> ---
>  nptl/pthread_cond_wait.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
> index 02d11c61db..0f50048c0b 100644
> --- a/nptl/pthread_cond_wait.c
> +++ b/nptl/pthread_cond_wait.c
> @@ -405,6 +405,10 @@ __pthread_cond_wait_common (pthread_cond_t
> *cond, pthread_mutex_t *mutex,
>    unsigned int g = wseq & 1;
>    uint64_t seq = wseq >> 1;
> 
> +  /* Acquire a group reference and use acquire MO for that so that
> we
> +     synchronize with the dummy read-modify-write in
> +     __condvar_quiesce_and_switch_g1 if we read from that.  */
> +  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);

Please explain the choice of MO properly in comments, unless it's
obvious.  In this example, you just state that you want to have the
synchronize-with relation, but not why.

The comment you broke up has the second part ("In turn, ...") that
explains why we want to have it.  But have you checked that moving the
reference acquisition to earlier will still be correct, in particular
regarding MOs?  Your model didn't include MOs, IIRC, so this needs
reasoning.  We also want to make sure that our future selfs still can
reconstruct this understanding without having to start from scratch, so
we really need good comments.

>    /* Increase the waiter reference count.  Relaxed MO is sufficient
> because
>       we only need to synchronize when decrementing the reference
> count.  */
>    unsigned int flags = atomic_fetch_add_relaxed (&cond-
> >__data.__wrefs, 8);
> @@ -422,6 +426,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>      {
>        __condvar_cancel_waiting (cond, seq, g, private);
>        __condvar_confirm_wakeup (cond, private);
> +      __condvar_dec_grefs (cond, g, private);
>        return err;
>      }
> 
> @@ -471,24 +476,14 @@ __pthread_cond_wait_common (pthread_cond_t
> *cond, pthread_mutex_t *mutex,
>  	    break;
> 
>  	  /* No signals available after spinning, so prepare to block.
> -	     We first acquire a group reference and use acquire MO for
> that so
> -	     that we synchronize with the dummy read-modify-write in
> -	     __condvar_quiesce_and_switch_g1 if we read from that.  In
> turn,
> -	     in this case this will make us see the closed flag on
> __g_signals
> -	     that designates a concurrent attempt to reuse the group's
> slot.
> -	     We use acquire MO for the __g_signals check to make the
> -	     __g1_start check work (see spinning above).
> -	     Note that the group reference acquisition will not mask
> the
> -	     release MO when decrementing the reference count because
> we use
> -	     an atomic read-modify-write operation and thus extend the
> release
> -	     sequence.  */

You lose this last sentence, but it matters when explaining this.

> -	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
> +	     First check the closed flag on __g_signals that designates
> a
> +	     concurrent attempt to reuse the group's slot. We use
> acquire MO for
> +	     the __g_signals check to make the __g1_start check work
> (see
> +	     spinning above).  */

See above.

>  	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1)
> != 0)
>  	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
>  	    {
> -	      /* Our group is closed.  Wake up any signalers that might
> be
> -		 waiting.  */
> -	      __condvar_dec_grefs (cond, g, private);
> +	      /* Our group is closed.  */
>  	      goto done;
>  	    }
> 
> @@ -508,7 +503,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
> 
>  	  if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW))
>  	    {
> -	      __condvar_dec_grefs (cond, g, private);
>  	      /* If we timed out, we effectively cancel waiting.  Note
> that
>  		 we have decremented __g_refs before cancellation, so
> that a
>  		 deadlock between waiting for quiescence of our group
> in
> @@ -518,8 +512,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>  	      result = err;
>  	      goto done;
>  	    }
> -	  else
> -	    __condvar_dec_grefs (cond, g, private);
> 
>  	  /* Reload signals.  See above for MO.  */
>  	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
> @@ -602,6 +594,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond,
> pthread_mutex_t *mutex,
>       to allow for execution of pthread_cond_destroy while having
> acquired the
>       mutex.  */
>    __condvar_confirm_wakeup (cond, private);
> +  __condvar_dec_grefs (cond, g, private);

This is the wrong order.  After confirming the wakeup for cond_destroy,
the thread must not touch the condvar memory anymore (including reading
it because destruction could unmap it, for example; futex calls on it
are fine).