Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP.
@ 2018-05-24 14:39 Stefan Liebler
  2018-06-02 21:32 ` Torvald Riegel
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Liebler @ 2018-05-24 14:39 UTC (permalink / raw)
  To: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

Hi,

while looking for a possible reason for "pthread_mutex_lock.c:79: 
__pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.", 
I've found a race condition if pthread_mutex_lock() is called for the 
"first time" by multiple threads.
Note: Lock elision has to be supported and enabled (export 
GLIBC_TUNABLES=glibc.elision.enable=1).

The attached test program creates threads which runs multiple iterations 
of the following sequence:
-pthread_mutex_init(): called only by one thread. Initializing with 
default mutex attributes.
-pthread_mutex_lock()
-pthread_mutex_unlock()
-pthread_mutex_destroy(): called only by one thread.

Here are some code snippets in order to follow the explanation below:
-nptl/pthread_mutex_lock.c:
  62int
  63__pthread_mutex_lock (pthread_mutex_t *mutex)
  64{
  65  unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex);
  ...
  73  if (__glibc_likely (type == PTHREAD_MUTEX_TIMED_NP))
  74    {
  75      FORCE_ELISION (mutex, goto elision);
  76    simple:
  77      /* Normal mutex.  */
  78      LLL_MUTEX_LOCK (mutex);
  79      assert (mutex->__data.__owner == 0);
  80    }
  81#ifdef HAVE_ELISION
  82  else if (__glibc_likely (type == PTHREAD_MUTEX_TIMED_ELISION_NP))
  83    {
  84  elision: __attribute__((unused))
  85      /* This case can never happen on a system without elision, 

  86         as the mutex type initialization functions will not 

  87         allow to set the elision flags.  */
  88      /* Don't record owner or users for elision case.  This is a 

  89         tail call.  */
  90      return LLL_MUTEX_LOCK_ELISION (mutex);
  91    }
  92#endif
...
156  /* Record the ownership.  */
157  mutex->__data.__owner = id;
158#ifndef NO_INCR
159  ++mutex->__data.__nusers;
160#endif


-sysdeps/unix/sysv/linux/s390/force-elision.h:
Note: This file is identical to the equivalent files for x86 and powerpc!
19/* Automatically enable elision for existing user lock kinds.  */
20#define FORCE_ELISION(m, s) \
21  if (__pthread_force_elision \
22      && (m->__data.__kind & PTHREAD_MUTEX_ELISION_FLAGS_NP) == 0) \
23    { \
24      mutex->__data.__kind |= PTHREAD_MUTEX_ELISION_NP; \
25      s; \
26    }

-nptl/pthread_mutex_unlock.c:
  36__pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr)
  37{
  38  int type = PTHREAD_MUTEX_TYPE_ELISION (mutex);
...
  43  if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP)
  44      == PTHREAD_MUTEX_TIMED_NP)
  45    {
  46      /* Always reset the owner field.  */
  47    normal:
  48      mutex->__data.__owner = 0;
  49      if (decr)
  50        /* One less user.  */
  51        --mutex->__data.__nusers;
...
  60  else if (__glibc_likely (type == PTHREAD_MUTEX_TIMED_ELISION_NP))
  61    {
  62      /* Don't reset the owner/users fields for elision.  */
  63      return lll_unlock_elision (mutex->__data.__lock, 
mutex->__data.__elision,
  64                                      PTHREAD_MUTEX_PSHARED (mutex));

-nptl/pthread_mutex_destroy.c:
26__pthread_mutex_destroy (pthread_mutex_t *mutex)
27{
28  LIBC_PROBE (mutex_destroy, 1, mutex);
29
30  if ((mutex->__data.__kind & PTHREAD_MUTEX_ROBUST_NORMAL_NP) == 0
31      && mutex->__data.__nusers != 0)
32    return EBUSY;



Assumption: Two threads are calling pthread_mutex_lock at the same time 
and have already loaded the mutex-type in line 65 with the type 
PTHREAD_MUTEX_TIMED_NP.

Thread 1 promotes the mutex to PTHREAD_MUTEX_ELISION_NP (see 
FORCE_ELISION in line 24) and jumps to LLL_MUTEX_LOCK_ELISION (see line 90).

Thread 2 is checking the mutex type in FORCE_ELISION (line 22).
As the mutex-type is already set to PTHREAD_MUTEX_ELISION_NP, the 
condition is false and the "Normal mutex" (line 77) is processed.
Note: The "normal mutex" records the ownership (see line 156).

Both threads are calling pthread_mutex_unlock and are loading mutex-type 
in line 38 with the type PTHREAD_MUTEX_TIMED_ELISION_NP.
Thus the ownership is not resetted (see line 62)!

The call to pthread_mutex_destroy is returning with EBUSY
as __nusers == 1 (see line 31).
Note: Although thread 2 has successfully called pthread_mutex_unlock, it 
is marked as the current owner of the mutex.

If a third thread has also loaded the mutex-type PTHREAD_MUTEX_TIMED_NP 
in pthread_mutex_lock, then the "assert (mutex->__data.__owner == 0)" in 
line 79 is triggered.



When I am running the attached test program on s390x, I can trigger the 
case "pthread_mutex_destroy()==EBUSY" within some thousand iterations.
And I could trigger the assertion while single-stepping in GDB.

Can somebody test and confirm this on x86, power, s390?

How can we solve this issue?

Bye
Stefan

[-- Attachment #2: tst_pthread_lock_assertion.c --]
[-- Type: text/x-csrc, Size: 3961 bytes --]

//CFLAGS=
//LDFLAGS=-lpthread

#define CHECK_LOCK_ELISION 1

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#if CHECK_LOCK_ELISION != 0
# ifdef __s390x__
#  include <sys/auxv.h>
#  include <htmintrin.h> /* Needed for builtin_tx_nesting_depth.  */
# endif
#endif

#define NUM_THREADS 3

typedef struct thr_info
{
  int nr;
  pthread_t thread;
} thr_info_t;

static pthread_barrier_t barrier;
static pthread_mutex_t mutex;
static int loop_count = -1;

static void
check_fail (const char *msg, thr_info_t *thr, int ret, int *info_round)
{
  if (ret != 0)
    {
      printf ("#%d: %s: failed with %d;",
	      (thr == NULL) ? -1 : thr->nr, msg, ret);
      if (info_round != NULL)
	printf (" in round=%d;", *info_round);
      puts ("");

      abort ();
    }
}

static void *
thr_func (void *arg)
{
  int ret, i;
  thr_info_t *thr = (thr_info_t *) arg;
  printf ("#%d: started\n", thr->nr);

  for (i = 0; i < loop_count; i++)
    {
      if (thr->nr == 0)
	{
	  ret = pthread_mutex_init (&mutex, NULL);
	  check_fail ("pthread_mutex_init", thr, ret, &i);
	}

      pthread_barrier_wait (&barrier);

      ret = pthread_mutex_lock (&mutex);
      check_fail ("pthread_mutex_lock", thr, ret, &i);

      ret = pthread_mutex_unlock (&mutex);
      check_fail ("pthread_mutex_unlock", thr, ret, &i);

      pthread_barrier_wait (&barrier);

      if (thr->nr == 0)
	{
	  ret = pthread_mutex_destroy (&mutex);
	  check_fail ("pthread_mutex_destroy", thr, ret, &i);

	  if ((i % 10000) == 0)
	    {
	      printf (".");
	      fflush (stdout);
	    }
	}

      pthread_barrier_wait (&barrier);
    }
  return NULL;
}

#if CHECK_LOCK_ELISION != 0
static int
check_if_lock_elision_is_available ()
{
  int success = EXIT_FAILURE;
# ifdef __s390x__
  if ((getauxval (AT_HWCAP) & HWCAP_S390_TE) == 0)
    {
      puts ("This machine does not support transactional execution!\n"
	    "You need a s390x CPU >= zEC12 and a lpar or a z/VM guest which "
	    "supports transactions!\n");
    }

  pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
  int nesting_depth_in_lock = -1;
  int ret;
  ret = pthread_mutex_lock (&m);
  check_fail ("check-elision; lock", NULL, ret, NULL);

  __asm__ (".machine push \n\t"
	   ".machine \"all\"");
  nesting_depth_in_lock = __builtin_tx_nesting_depth ();
  __asm__ (".machine pop");

  ret = pthread_mutex_unlock (&m);
  check_fail ("check-elision; unlock", NULL, ret, NULL);

  if (nesting_depth_in_lock > 0)
    {
      puts ("The lock was elided!");
      success = EXIT_SUCCESS;
    }
  else
    {
      puts ("The lock was NOT elided!\n"
	    "Perhaps you've forgot to set the environment variable:\n"
	    "GLIBC_TUNABLES=glibc.elision.enable=1");
    }

# else
  puts ("Please add a check if lock-elision is available on your architecture. "
	"The check in check_if_lock_elision_is_available () assumes, "
	"that lock-elision is enabled!\n\n");
  success = EXIT_SUCCESS;
# endif

  return success;
}
#endif

int
main (int argc, const char *argv[])
{
  int i;
  int ret;

  if (argc == 2)
    {
      loop_count = strtol (argv[1], NULL, 0);
    }

  if (loop_count <= 0)
    loop_count = 2000000;

#if CHECK_LOCK_ELISION != 0
  check_if_lock_elision_is_available ();
#endif

  printf ("main: start %d threads to run %d iterations.\n",
	  NUM_THREADS, loop_count);

  ret = pthread_barrier_init (&barrier, NULL, NUM_THREADS);
  check_fail ("pthread_barrier_init", NULL, ret, NULL);

  thr_info_t thrs[NUM_THREADS];
  for (i = 0; i < NUM_THREADS; i++)
    {
      thrs[i].nr = i;
      ret = pthread_create (&(thrs[i].thread), NULL, thr_func, &(thrs[i]));
      check_fail ("pthread_create", NULL, ret, NULL);
    }

  for (i = 0; i < NUM_THREADS; i++)
    {
      ret = pthread_join (thrs[i].thread, NULL);
      check_fail ("pthread_join", NULL, ret, NULL);
    }

  ret = pthread_barrier_destroy (&barrier);
  check_fail ("pthread_barrier_destroy", NULL, ret, NULL);

  printf ("main: end.\n");
  return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP.
  2018-05-24 14:39 Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP Stefan Liebler
@ 2018-06-02 21:32 ` Torvald Riegel
  2018-06-12 14:26   ` Stefan Liebler
  0 siblings, 1 reply; 3+ messages in thread
From: Torvald Riegel @ 2018-06-02 21:32 UTC (permalink / raw)
  To: Stefan Liebler; +Cc: GNU C Library, Carlos O'Donell

On Thu, 2018-05-24 at 16:39 +0200, Stefan Liebler wrote:
> How can we solve this issue?

Good catch.  I don't have the time currently to look into this in
detail, but the underlying problem is that __kind is concurrently read
and modified, so we need to have a plan for how to deal with it,
document that (concurrency notes), and use atomics to avoid the data
races and make everyone aware of this field being concurrently accessed
data.

Regarding how to solve this: Maybe we should force elision only when
unlocking, so that the change happens within the critical section and
doesn't affect further operations within that critical section.  We'd
still need to check all the paths on which concurrent threads also
trying to acquire the lock may have old information (ie, elision bit not
set); my guess would be that it should usually be fine if they believe
elision is not yet enabled if they read __kind again after acquiring the
lock.

Not updating __owner or __nusers if elision is enabled may also be an
option (the mutex types that require checks aren't compatible with
elision).

Another approach may be to cache __pthread_force_elision in the mutex
itself but not by messing with __kind in a way that affects the handling
of the mutex.

More generally, it would be great if we could clean up the mutex code,
remove some of the special cases and checks that aren't required by
POSIX.  In practice, I guess that many people simply don't check return
codes anyway -- one simply expects locking to work.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP.
  2018-06-02 21:32 ` Torvald Riegel
@ 2018-06-12 14:26   ` Stefan Liebler
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Liebler @ 2018-06-12 14:26 UTC (permalink / raw)
  To: GNU C Library; +Cc: Torvald Riegel, Carlos O'Donell

Hi,

I've opened the "Bug 23275 - Race in pthread_mutex_lock while promoting 
to PTHREAD_MUTEX_ELISION_NP" and posted a patch.
Please have a look at it and answer to that post:
"[PATCH] Fix race in pthread_mutex_lock while promoting to 
PTHREAD_MUTEX_ELISION_NP [BZ #23275]"
(https://www.sourceware.org/ml/libc-alpha/2018-06/msg00246.html)

On 06/02/2018 11:31 PM, Torvald Riegel wrote:
> On Thu, 2018-05-24 at 16:39 +0200, Stefan Liebler wrote:
>> How can we solve this issue?
> 
> Good catch.  I don't have the time currently to look into this in
> detail, but the underlying problem is that __kind is concurrently read
> and modified, so we need to have a plan for how to deal with it,
> document that (concurrency notes), and use atomics to avoid the data
> races and make everyone aware of this field being concurrently accessed
> data.
> 
> Regarding how to solve this: Maybe we should force elision only when
> unlocking, so that the change happens within the critical section and
> doesn't affect further operations within that critical section.  We'd
> still need to check all the paths on which concurrent threads also
> trying to acquire the lock may have old information (ie, elision bit not
> set); my guess would be that it should usually be fine if they believe
> elision is not yet enabled if they read __kind again after acquiring the
> lock.
If we force elision when unlocking, __kind is changed within the 
critical section, but __kind is read in pthread_mutex_lock before 
entering the critical section. Then there could be other threads in 
pthread_mutex_lock - in normal-path - waiting for the futex. That's fine 
and yes, I think only one thread is within the critical section. BUT 
then the __owner, __nusers fields are set while locking, but not while 
unlocking.
> 
> Not updating __owner or __nusers if elision is enabled may also be an
> option (the mutex types that require checks aren't compatible with
> elision).
If the mutex is using the elision path, then those fields aren't 
updated. If the race occures, then pthread_mutex_lock - in normal-path - 
updates those fields, but pthread_mutex_unlock - in elision-path - does 
not. This leads to the mentioned symptoms.
> 
> Another approach may be to cache __pthread_force_elision in the mutex
> itself but not by messing with __kind in a way that affects the handling
> of the mutex.
> 
> More generally, it would be great if we could clean up the mutex code,
> remove some of the special cases and checks that aren't required by
> POSIX.  In practice, I guess that many people simply don't check return
> codes anyway -- one simply expects locking to work.
> 

Bye
Stefan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-06-12 14:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 14:39 Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP Stefan Liebler
2018-06-02 21:32 ` Torvald Riegel
2018-06-12 14:26   ` Stefan Liebler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).