public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
* Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue
       [not found] <1395409800-4457-1-git-send-email-bniebuhr@efjohnson.com>
@ 2014-03-27 20:31 ` Maxim Kuvyrkov
  2014-03-27 20:54   ` Will Newton
  0 siblings, 1 reply; 6+ messages in thread
From: Maxim Kuvyrkov @ 2014-03-27 20:31 UTC (permalink / raw)
  To: bniebuhr; +Cc: uclibc, libc-ports

[CC: libc-ports as glibc's ARM and sparc32 lowlevellock.c need same patch.]

On Mar 22, 2014, at 2:50 AM, bniebuhr@efjohnson.com wrote:

> From: Brian Niebuhr <bniebuhr@efjohnson.com>
> 
> __lll_timedlock_wait has a bug that is exposed in these conditions:
> 
> 1. Thread 1 acquires the lock
> 2. Thread 2 attempts to acquire the lock and waits via futex syscall
> 3. Thread 1 unlocks the lock and wakes the waiting thread
> 4. Thread 1 re-aquires the lock before thread 2 gets the CPU
> 
> What happens in this case is that the futex value is set to '1' since
> Thread 1 reaquired the lock through the fast path (since it had just been
> released).  The problem is that Thread 2 is in the loop in
> __lll_timedlock_wait and it expects the futex value to be '2'.
> atomic_compare_and_exchange_bool_acq only sets the futex value to '2' if
> it is already set to '0', and since Thread 1 reaquired the lock, the
> futex remains set to '1'.
> 
> When Thread 2 attempts to wait on the futex, the operating system returns
> -EWOULDBLOCK since the futex value is not '2'.  This causes a busy wait
> condition where Thread 2 continuously attempts to wait on the futex and
> the kernel immediately returns -EWOULDBLOCK.  This continues until Thread
> 1 releases the lock.
> 
> The fix is to use atomic_exchange_acq instead of
> atomic_compare_and_exchange_bool_acq which will force the futex value to
> '2' on each loop iteration.

You know, I started this reply as a you-are-wrong-here-is-why, but after looking at both glibc and uclibc, I agree that you are correct.  Just to reiterate for others, the problem here is not correctness, but busy-wait instead of sleep under certain conditions.

>  This change makes uClibc line up with
> glibc's implementation.

This problem is fixed in glibc's ./nptl/sysdeps/unix/sysv/linux/lowlevellock.c, but still present in glibc's ARM and sparc32 lowlevellock.c.  Do you plan to fix these too?

Interestingly, glibc's hppa lowlevellock.c has an alternative solution to this problem.  I can't quite figure out which solution is faster, so would appreciate a second pair of eyes.  My feeling is that the generic (the one in your patch) version is faster.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

> 
> Signed-off-by: Brian Niebuhr <bniebuhr@efjohnson.com>
> ---
> libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
> index af864b3..4f7e890 100644
> --- a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
> +++ b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
> @@ -60,10 +60,7 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>     return EINVAL;
> 
>   /* Upgrade the lock.  */
> -  if (atomic_exchange_acq (futex, 2) == 0)
> -    return 0;
> -
> -  do
> +  while (atomic_exchange_acq (futex, 2) != 0)
>     {
>       struct timeval tv;
> 
> @@ -86,7 +83,6 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>       // XYZ: Lost the lock to check whether it was private.
>       lll_futex_timed_wait (futex, 2, &rt, private);
>     }
> -  while (atomic_compare_and_exchange_bool_acq (futex, 2, 0) != 0);
> 
>   return 0;
> }
> -- 
> 1.8.3.2
> 
> _______________________________________________
> uClibc mailing list
> uClibc@uclibc.org
> http://lists.busybox.net/mailman/listinfo/uclibc

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue
  2014-03-27 20:31 ` [PATCH] Fix __lll_timedlock_wait busy-wait issue Maxim Kuvyrkov
@ 2014-03-27 20:54   ` Will Newton
  2014-03-27 22:01     ` Joseph S. Myers
  0 siblings, 1 reply; 6+ messages in thread
From: Will Newton @ 2014-03-27 20:54 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: bniebuhr, uclibc, libc-ports, libc-alpha

And CC libc-alpha, as libc-ports has mostly become disused.

On 27 March 2014 20:31, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> [CC: libc-ports as glibc's ARM and sparc32 lowlevellock.c need same patch.]
>
> On Mar 22, 2014, at 2:50 AM, bniebuhr@efjohnson.com wrote:
>
>> From: Brian Niebuhr <bniebuhr@efjohnson.com>
>>
>> __lll_timedlock_wait has a bug that is exposed in these conditions:
>>
>> 1. Thread 1 acquires the lock
>> 2. Thread 2 attempts to acquire the lock and waits via futex syscall
>> 3. Thread 1 unlocks the lock and wakes the waiting thread
>> 4. Thread 1 re-aquires the lock before thread 2 gets the CPU
>>
>> What happens in this case is that the futex value is set to '1' since
>> Thread 1 reaquired the lock through the fast path (since it had just been
>> released).  The problem is that Thread 2 is in the loop in
>> __lll_timedlock_wait and it expects the futex value to be '2'.
>> atomic_compare_and_exchange_bool_acq only sets the futex value to '2' if
>> it is already set to '0', and since Thread 1 reaquired the lock, the
>> futex remains set to '1'.
>>
>> When Thread 2 attempts to wait on the futex, the operating system returns
>> -EWOULDBLOCK since the futex value is not '2'.  This causes a busy wait
>> condition where Thread 2 continuously attempts to wait on the futex and
>> the kernel immediately returns -EWOULDBLOCK.  This continues until Thread
>> 1 releases the lock.
>>
>> The fix is to use atomic_exchange_acq instead of
>> atomic_compare_and_exchange_bool_acq which will force the futex value to
>> '2' on each loop iteration.
>
> You know, I started this reply as a you-are-wrong-here-is-why, but after looking at both glibc and uclibc, I agree that you are correct.  Just to reiterate for others, the problem here is not correctness, but busy-wait instead of sleep under certain conditions.
>
>>  This change makes uClibc line up with
>> glibc's implementation.
>
> This problem is fixed in glibc's ./nptl/sysdeps/unix/sysv/linux/lowlevellock.c, but still present in glibc's ARM and sparc32 lowlevellock.c.  Do you plan to fix these too?
>
> Interestingly, glibc's hppa lowlevellock.c has an alternative solution to this problem.  I can't quite figure out which solution is faster, so would appreciate a second pair of eyes.  My feeling is that the generic (the one in your patch) version is faster.
>
> Thank you,
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>>
>> Signed-off-by: Brian Niebuhr <bniebuhr@efjohnson.com>
>> ---
>> libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
>> index af864b3..4f7e890 100644
>> --- a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
>> +++ b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
>> @@ -60,10 +60,7 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>>     return EINVAL;
>>
>>   /* Upgrade the lock.  */
>> -  if (atomic_exchange_acq (futex, 2) == 0)
>> -    return 0;
>> -
>> -  do
>> +  while (atomic_exchange_acq (futex, 2) != 0)
>>     {
>>       struct timeval tv;
>>
>> @@ -86,7 +83,6 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>>       // XYZ: Lost the lock to check whether it was private.
>>       lll_futex_timed_wait (futex, 2, &rt, private);
>>     }
>> -  while (atomic_compare_and_exchange_bool_acq (futex, 2, 0) != 0);
>>
>>   return 0;
>> }
>> --
>> 1.8.3.2
>>
>> _______________________________________________
>> uClibc mailing list
>> uClibc@uclibc.org
>> http://lists.busybox.net/mailman/listinfo/uclibc
>



-- 
Will Newton
Toolchain Working Group, Linaro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue
  2014-03-27 20:54   ` Will Newton
@ 2014-03-27 22:01     ` Joseph S. Myers
  2014-03-27 22:14       ` Maxim Kuvyrkov
  2014-03-28 18:25       ` Torvald Riegel
  0 siblings, 2 replies; 6+ messages in thread
From: Joseph S. Myers @ 2014-03-27 22:01 UTC (permalink / raw)
  To: Will Newton; +Cc: Maxim Kuvyrkov, bniebuhr, uclibc, libc-ports, libc-alpha

I don't know how this might relate to 
<https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see 
<https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and 
<https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest 
of that thread).  But my preference for how to address this is definitely 
to move to unifying lowlevellock.[ch] files across as many architectures 
as possible - which requires someone to understand the differences and 
produce a careful analysis that shows what the best form for generic files 
is and what cases actually require architecture-specific files to override 
those generic files (preferably overriding only the bits that need 
overriding).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue
  2014-03-27 22:01     ` Joseph S. Myers
@ 2014-03-27 22:14       ` Maxim Kuvyrkov
  2014-03-27 22:19         ` Joseph S. Myers
  2014-03-28 18:25       ` Torvald Riegel
  1 sibling, 1 reply; 6+ messages in thread
From: Maxim Kuvyrkov @ 2014-03-27 22:14 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Will Newton, bniebuhr, uclibc, libc-ports, libc-alpha

On Mar 28, 2014, at 11:01 AM, Joseph S. Myers <joseph@codesourcery.com> wrote:

> I don't know how this might relate to 
> <https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see 
> <https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and 
> <https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest 
> of that thread).  But my preference for how to address this is definitely 
> to move to unifying lowlevellock.[ch] files across as many architectures 
> as possible - which requires someone to understand the differences and 
> produce a careful analysis that shows what the best form for generic files 
> is and what cases actually require architecture-specific files to override 
> those generic files (preferably overriding only the bits that need 
> overriding).

Yeap, it's the same issue in the PR and same solution as in this thread.  Unfortunately, the previous discussion veered off towards sparc away from ARM and got forgotten.

I agree that unifying lowlevellock.c implementation is the way forward.  At the very least I will make sure that ARM doesn't have unnecessary divergence from generic lowlevellock.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue
  2014-03-27 22:14       ` Maxim Kuvyrkov
@ 2014-03-27 22:19         ` Joseph S. Myers
  0 siblings, 0 replies; 6+ messages in thread
From: Joseph S. Myers @ 2014-03-27 22:19 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Will Newton, bniebuhr, uclibc, libc-ports, libc-alpha

On Fri, 28 Mar 2014, Maxim Kuvyrkov wrote:

> On Mar 28, 2014, at 11:01 AM, Joseph S. Myers <joseph@codesourcery.com> wrote:
> 
> > I don't know how this might relate to 
> > <https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see 
> > <https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and 
> > <https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest 
> > of that thread).  But my preference for how to address this is definitely 
> > to move to unifying lowlevellock.[ch] files across as many architectures 
> > as possible - which requires someone to understand the differences and 
> > produce a careful analysis that shows what the best form for generic files 
> > is and what cases actually require architecture-specific files to override 
> > those generic files (preferably overriding only the bits that need 
> > overriding).
> 
> Yeap, it's the same issue in the PR and same solution as in this thread.  
> Unfortunately, the previous discussion veered off towards sparc away 
> from ARM and got forgotten.

The present thread is specifically discussing lowlevellock.c, but Carlos 
suggested in the previous discussion that the real issue was in 
__lll_timedlock in lowlevellock.h.  I think both files need unification 
across architectures.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue
  2014-03-27 22:01     ` Joseph S. Myers
  2014-03-27 22:14       ` Maxim Kuvyrkov
@ 2014-03-28 18:25       ` Torvald Riegel
  1 sibling, 0 replies; 6+ messages in thread
From: Torvald Riegel @ 2014-03-28 18:25 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Will Newton, Maxim Kuvyrkov, bniebuhr, uclibc, libc-ports, libc-alpha

On Thu, 2014-03-27 at 22:01 +0000, Joseph S. Myers wrote:
> I don't know how this might relate to 
> <https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see 
> <https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and 
> <https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest 
> of that thread).  But my preference for how to address this is definitely 
> to move to unifying lowlevellock.[ch] files across as many architectures 
> as possible - which requires someone to understand the differences and 
> produce a careful analysis that shows what the best form for generic files 
> is and what cases actually require architecture-specific files to override 
> those generic files (preferably overriding only the bits that need 
> overriding).
> 

I agree.  My gut feeling is that the locks should eventually become
unified C code, using atomics to do the synchronization;
architecture-specific code should be either in the atomics or in more
generally useful spin-waiting code (which could be used by other sync
constructs as well).  The futex syscall is really on the slowpath; if
you hit it, you will have had at least a cache miss on the futex var,
and doing the syscall will likely give you more cache misses.
Therefore, I don't see a reason why the futex syscall needs to have
custom asm implementations such as on x86 currently.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-28 18:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1395409800-4457-1-git-send-email-bniebuhr@efjohnson.com>
2014-03-27 20:31 ` [PATCH] Fix __lll_timedlock_wait busy-wait issue Maxim Kuvyrkov
2014-03-27 20:54   ` Will Newton
2014-03-27 22:01     ` Joseph S. Myers
2014-03-27 22:14       ` Maxim Kuvyrkov
2014-03-27 22:19         ` Joseph S. Myers
2014-03-28 18:25       ` Torvald Riegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).