FAIL nptl/tst-robustpi4

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* FAIL nptl/tst-robustpi4
@ 2017-01-26 15:29 Stefan Liebler
  2017-01-26 16:12 ` Carlos O'Donell
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2017-01-26 15:29 UTC (permalink / raw)
  To: libc-alpha

Hi,

On s390, I've recognized a FAIL in nptl/tst-robustpi4 in about 16 of 
10000 iterations of this testcase.
Does anyone else see failures here, too?

If the test fails, I get:
tst-robustpi4: ../nptl/pthread_mutex_lock.c:424: 
__pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) 
!= ESRCH || !robust' failed.
Didn't expect signal from child: got `Aborted'

The mutex is a "robust pi" one, thus the futex-syscall returned ESRCH here.
However, the comment before the assertion claims:
/* ESRCH can happen only for non-robust PI mutexes where
    the owner of the lock died.  */

The coredumps show that the tf-thread has already finished
and the do_test-thread tried pthread_mutex_lock(&m1) and shall return 
EOWNERDEAD (see nptl/tst-robust1.c:202 e = LOCK (&m1);).
But instead the futex syscall returns ESRCH.
and (gdb) p/x m1->__data.__lock
= 0xc0000000
= FUTEX_OWNER_DIED | FUTEX_WAITERS

Furthermore, the coredumps always show an even value in round-variable.
Thus tf is not joined (tst-robust1.c:190) before calling 
pthread_mutex_lock(tst-robust1.c:202).

If the do_test-thread waits a bit (e.g. doing something in a loop) 
before locking the mutex, the tf-thread has already called 
__exit_thread(pthread_create.c:478).
Then m1->__data.__lock is already marked with FUTEX_OWNER_DIED | 0 
before calling the futex syscall in pthread_mutex_lock 
(pthread_mutex_lock.c:411).
Then the futex syscall takes over the mutex by setting __lock to 
FUTEX_OWNER_DIED | do_test-TID and is returning 0.
If I run the test with such a "wait-loop" for several times, I see no fails.

If the do_test-thread locks the mutex before __exit_thread() is called 
in tf-thread, the futex syscall sets FUTEX_WAITERS bit and blocks until 
tf-thread has exited.
Afterwards 0 is returned and m1->__data.__lock is 0xc0000000
= FUTEX_OWNER_DIED | FUTEX_WAITERS.
If I run the test with a "wait-loop" before pthread_testcancel in 
tf-thread for several times as described, I see no fails, too.

It seems as a race between futex- and exit-syscall causes ESRCH result 
from futex-syscall.

I see those fails with Linux 4.8 / 4.9 running in a z/VM guest
as well as with 4.6 on a LPAR (but less often).

Bye
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FAIL nptl/tst-robustpi4
  2017-01-26 15:29 FAIL nptl/tst-robustpi4 Stefan Liebler
@ 2017-01-26 16:12 ` Carlos O'Donell
  2017-01-26 16:22   ` Torvald Riegel
  0 siblings, 1 reply; 6+ messages in thread
From: Carlos O'Donell @ 2017-01-26 16:12 UTC (permalink / raw)
  To: Stefan Liebler, libc-alpha

On 01/26/2017 10:29 AM, Stefan Liebler wrote:
> It seems as a race between futex- and exit-syscall causes ESRCH
> result from futex-syscall.
> 
> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
> well as with 4.6 on a LPAR (but less often).

I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
across a wide number of kernels, but never tst-robustpi4.

https://sourceware.org/bugzilla/show_bug.cgi?id=19004

The robustpi support is certainly not very robust as Torvald's
recent fixes show, and there still remains at least one design
flaw that can't be fixed.

e.g.
https://sourceware.org/bugzilla/show_bug.cgi?id=14485

and...
https://sourceware.org/bugzilla/show_bug.cgi?id=19089

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FAIL nptl/tst-robustpi4
  2017-01-26 16:12 ` Carlos O'Donell
@ 2017-01-26 16:22   ` Torvald Riegel
  2018-06-29  6:55     ` FAIL nptl/tst-robustpi4 [BZ 23183] Stefan Liebler
  0 siblings, 1 reply; 6+ messages in thread
From: Torvald Riegel @ 2017-01-26 16:22 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Stefan Liebler, libc-alpha

On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
> > It seems as a race between futex- and exit-syscall causes ESRCH
> > result from futex-syscall.

I'll have a closer look at this.

> > I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
> > well as with 4.6 on a LPAR (but less often).
> 
> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
> across a wide number of kernels, but never tst-robustpi4.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
> 
> The robustpi support is certainly not very robust as Torvald's
> recent fixes show, and there still remains at least one design
> flaw that can't be fixed.
> 
> e.g.
> https://sourceware.org/bugzilla/show_bug.cgi?id=14485

The underlying problem for that bug does not affect PI+robust, just
robust, I think.  Unless I forgot about something, PI+robust should
always use the kernel to unlock.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FAIL nptl/tst-robustpi4 [BZ 23183]
  2017-01-26 16:22   ` Torvald Riegel
@ 2018-06-29  6:55     ` Stefan Liebler
  2018-06-29  7:39       ` Florian Weimer
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2018-06-29  6:55 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell, Torvald Riegel

On 01/26/2017 05:22 PM, Torvald Riegel wrote:
> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
>> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
>>> It seems as a race between futex- and exit-syscall causes ESRCH
>>> result from futex-syscall.
> 
> I'll have a closer look at this.
> 
>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
>>> well as with 4.6 on a LPAR (but less often).
>>
>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
>> across a wide number of kernels, but never tst-robustpi4.
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>>
>> The robustpi support is certainly not very robust as Torvald's
>> recent fixes show, and there still remains at least one design
>> flaw that can't be fixed.
>>
>> e.g.
>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
> 
> The underlying problem for that bug does not affect PI+robust, just
> robust, I think.  Unless I forgot about something, PI+robust should
> always use the kernel to unlock.
> 
> 
> 

Hi,

in the meantime, Florian Weimer could also reproduce this issue and 
opened the bugzilla Bug 23183 - tst-robustpi4 test failure 
(https://sourceware.org/bugzilla/show_bug.cgi?id=23183).

I've also dig a bit deeper - see details in bugzilla - and was also able 
to reproduce it on intel.

If the thread with locked mutex is executing the exit-syscall
while the main-thread is executing the futex-syscall,
then it could lead to this ESRCH return value of the futex-syscall which 
triggers the assertion.

In this situation, the futex-syscall has already added the FUTEX_WAITERS 
bit to the lock-value and is then calling attach_to_pi_owner().

The exit-syscall is now setting the lock-value to FUTEX_WAITERS | 
FUTEX_OWNER_DIED and is proceeding.

attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is 
testing if the owner is currently exiting. In those cases, ESRCH is 
returned!

Back in glibc, this assertion is triggered:
/* ESRCH can happen only for non-robust PI mutexes where
    the owner of the lock died.  */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);

The assertion/comment does not agree with the current behaviour of the 
kernel. Any ideas?

Bye
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FAIL nptl/tst-robustpi4 [BZ 23183]
  2018-06-29  6:55     ` FAIL nptl/tst-robustpi4 [BZ 23183] Stefan Liebler
@ 2018-06-29  7:39       ` Florian Weimer
  2018-06-29  8:22         ` Stefan Liebler
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2018-06-29  7:39 UTC (permalink / raw)
  To: Stefan Liebler, libc-alpha; +Cc: Carlos O'Donell, Torvald Riegel

On 06/29/2018 08:54 AM, Stefan Liebler wrote:
> On 01/26/2017 05:22 PM, Torvald Riegel wrote:
>> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
>>> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
>>>> It seems as a race between futex- and exit-syscall causes ESRCH
>>>> result from futex-syscall.
>>
>> I'll have a closer look at this.
>>
>>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
>>>> well as with 4.6 on a LPAR (but less often).
>>>
>>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
>>> across a wide number of kernels, but never tst-robustpi4.
>>>
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>>>
>>> The robustpi support is certainly not very robust as Torvald's
>>> recent fixes show, and there still remains at least one design
>>> flaw that can't be fixed.
>>>
>>> e.g.
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>>
>> The underlying problem for that bug does not affect PI+robust, just
>> robust, I think.Â  Unless I forgot about something, PI+robust should
>> always use the kernel to unlock.

> in the meantime, Florian Weimer could also reproduce this issue and 
> opened the bugzilla Bug 23183 - tst-robustpi4 test failure 
> (https://sourceware.org/bugzilla/show_bug.cgi?id=23183).
> 
> I've also dig a bit deeper - see details in bugzilla - and was also able 
> to reproduce it on intel.
> 
> If the thread with locked mutex is executing the exit-syscall
> while the main-thread is executing the futex-syscall,
> then it could lead to this ESRCH return value of the futex-syscall which 
> triggers the assertion.
> 
> In this situation, the futex-syscall has already added the FUTEX_WAITERS 
> bit to the lock-value and is then calling attach_to_pi_owner().
> 
> The exit-syscall is now setting the lock-value to FUTEX_WAITERS | 
> FUTEX_OWNER_DIED and is proceeding.
> 
> attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is 
> testing if the owner is currently exiting. In those cases, ESRCH is 
> returned!

Does the kernel look at the TID and determine that it no longer exists, 
or does it use the FUTEX_OWNER_DIED bit to detect this situation?

I'm worried that using the TID introduces a TID race here.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FAIL nptl/tst-robustpi4 [BZ 23183]
  2018-06-29  7:39       ` Florian Weimer
@ 2018-06-29  8:22         ` Stefan Liebler
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Liebler @ 2018-06-29  8:22 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha; +Cc: Carlos O'Donell, Torvald Riegel

On 06/29/2018 09:39 AM, Florian Weimer wrote:
> On 06/29/2018 08:54 AM, Stefan Liebler wrote:
>> On 01/26/2017 05:22 PM, Torvald Riegel wrote:
>>> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
>>>> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
>>>>> It seems as a race between futex- and exit-syscall causes ESRCH
>>>>> result from futex-syscall.
>>>
>>> I'll have a closer look at this.
>>>
>>>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
>>>>> well as with 4.6 on a LPAR (but less often).
>>>>
>>>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
>>>> across a wide number of kernels, but never tst-robustpi4.
>>>>
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>>>>
>>>> The robustpi support is certainly not very robust as Torvald's
>>>> recent fixes show, and there still remains at least one design
>>>> flaw that can't be fixed.
>>>>
>>>> e.g.
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>>>
>>> The underlying problem for that bug does not affect PI+robust, just
>>> robust, I think.Â  Unless I forgot about something, PI+robust should
>>> always use the kernel to unlock.
> 
>> in the meantime, Florian Weimer could also reproduce this issue and 
>> opened the bugzilla Bug 23183 - tst-robustpi4 test failure 
>> (https://sourceware.org/bugzilla/show_bug.cgi?id=23183).
>>
>> I've also dig a bit deeper - see details in bugzilla - and was also 
>> able to reproduce it on intel.
>>
>> If the thread with locked mutex is executing the exit-syscall
>> while the main-thread is executing the futex-syscall,
>> then it could lead to this ESRCH return value of the futex-syscall 
>> which triggers the assertion.
>>
>> In this situation, the futex-syscall has already added the 
>> FUTEX_WAITERS bit to the lock-value and is then calling 
>> attach_to_pi_owner().
>>
>> The exit-syscall is now setting the lock-value to FUTEX_WAITERS | 
>> FUTEX_OWNER_DIED and is proceeding.
>>
>> attach_to_pi_owner() is now e.g. trying to get the owner-task and/or 
>> is testing if the owner is currently exiting. In those cases, ESRCH is 
>> returned!
> 
> Does the kernel look at the TID and determine that it no longer exists, 
> or does it use the FUTEX_OWNER_DIED bit to detect this situation?
> 
> I'm worried that using the TID introduces a TID race here.
> 
> Thanks,
> Florian
> 

There can be different cases.
If the exit-syscall has already called handle_futex_death(). Then the 
TID is zeroed and the lock value is FUTEX_WAITERS | FUTEX_OWNER_DIED. 
Then attach_to_pi_onwer() fails with ESRCH due to:
pid_t pid = uval & FUTEX_TID_MASK;
struct task_struct *p;
if (!pid) return -ESRCH;
...
But even if the TID is available, it can fail due to:
	p = find_get_task_by_vpid(pid);
	if (!p) return -ESRCH;

If we've got the task_struct and if (p->flags & PF_EXITING) is true, it 
depends on (p->flags & PF_EXITPIDONE). Then either ESRCH is returned 
immediately. Or EAGAIN is returned which leads to calling 
attach_to_pi_owner() again. After one/some rounds, it finally returns ESRCH.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-29  8:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 15:29 FAIL nptl/tst-robustpi4 Stefan Liebler
2017-01-26 16:12 ` Carlos O'Donell
2017-01-26 16:22   ` Torvald Riegel
2018-06-29  6:55     ` FAIL nptl/tst-robustpi4 [BZ 23183] Stefan Liebler
2018-06-29  7:39       ` Florian Weimer
2018-06-29  8:22         ` Stefan Liebler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).