* FAIL nptl/tst-robustpi4
@ 2017-01-26 15:29 Stefan Liebler
2017-01-26 16:12 ` Carlos O'Donell
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2017-01-26 15:29 UTC (permalink / raw)
To: libc-alpha
Hi,
On s390, I've recognized a FAIL in nptl/tst-robustpi4 in about 16 of
10000 iterations of this testcase.
Does anyone else see failures here, too?
If the test fails, I get:
tst-robustpi4: ../nptl/pthread_mutex_lock.c:424:
__pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err)
!= ESRCH || !robust' failed.
Didn't expect signal from child: got `Aborted'
The mutex is a "robust pi" one, thus the futex-syscall returned ESRCH here.
However, the comment before the assertion claims:
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
The coredumps show that the tf-thread has already finished
and the do_test-thread tried pthread_mutex_lock(&m1) and shall return
EOWNERDEAD (see nptl/tst-robust1.c:202 e = LOCK (&m1);).
But instead the futex syscall returns ESRCH.
and (gdb) p/x m1->__data.__lock
= 0xc0000000
= FUTEX_OWNER_DIED | FUTEX_WAITERS
Furthermore, the coredumps always show an even value in round-variable.
Thus tf is not joined (tst-robust1.c:190) before calling
pthread_mutex_lock(tst-robust1.c:202).
If the do_test-thread waits a bit (e.g. doing something in a loop)
before locking the mutex, the tf-thread has already called
__exit_thread(pthread_create.c:478).
Then m1->__data.__lock is already marked with FUTEX_OWNER_DIED | 0
before calling the futex syscall in pthread_mutex_lock
(pthread_mutex_lock.c:411).
Then the futex syscall takes over the mutex by setting __lock to
FUTEX_OWNER_DIED | do_test-TID and is returning 0.
If I run the test with such a "wait-loop" for several times, I see no fails.
If the do_test-thread locks the mutex before __exit_thread() is called
in tf-thread, the futex syscall sets FUTEX_WAITERS bit and blocks until
tf-thread has exited.
Afterwards 0 is returned and m1->__data.__lock is 0xc0000000
= FUTEX_OWNER_DIED | FUTEX_WAITERS.
If I run the test with a "wait-loop" before pthread_testcancel in
tf-thread for several times as described, I see no fails, too.
It seems as a race between futex- and exit-syscall causes ESRCH result
from futex-syscall.
I see those fails with Linux 4.8 / 4.9 running in a z/VM guest
as well as with 4.6 on a LPAR (but less often).
Bye
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAIL nptl/tst-robustpi4
2017-01-26 15:29 FAIL nptl/tst-robustpi4 Stefan Liebler
@ 2017-01-26 16:12 ` Carlos O'Donell
2017-01-26 16:22 ` Torvald Riegel
0 siblings, 1 reply; 6+ messages in thread
From: Carlos O'Donell @ 2017-01-26 16:12 UTC (permalink / raw)
To: Stefan Liebler, libc-alpha
On 01/26/2017 10:29 AM, Stefan Liebler wrote:
> It seems as a race between futex- and exit-syscall causes ESRCH
> result from futex-syscall.
>
> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
> well as with 4.6 on a LPAR (but less often).
I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
across a wide number of kernels, but never tst-robustpi4.
https://sourceware.org/bugzilla/show_bug.cgi?id=19004
The robustpi support is certainly not very robust as Torvald's
recent fixes show, and there still remains at least one design
flaw that can't be fixed.
e.g.
https://sourceware.org/bugzilla/show_bug.cgi?id=14485
and...
https://sourceware.org/bugzilla/show_bug.cgi?id=19089
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAIL nptl/tst-robustpi4
2017-01-26 16:12 ` Carlos O'Donell
@ 2017-01-26 16:22 ` Torvald Riegel
2018-06-29 6:55 ` FAIL nptl/tst-robustpi4 [BZ 23183] Stefan Liebler
0 siblings, 1 reply; 6+ messages in thread
From: Torvald Riegel @ 2017-01-26 16:22 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: Stefan Liebler, libc-alpha
On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
> > It seems as a race between futex- and exit-syscall causes ESRCH
> > result from futex-syscall.
I'll have a closer look at this.
> > I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
> > well as with 4.6 on a LPAR (but less often).
>
> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
> across a wide number of kernels, but never tst-robustpi4.
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>
> The robustpi support is certainly not very robust as Torvald's
> recent fixes show, and there still remains at least one design
> flaw that can't be fixed.
>
> e.g.
> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
The underlying problem for that bug does not affect PI+robust, just
robust, I think. Unless I forgot about something, PI+robust should
always use the kernel to unlock.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAIL nptl/tst-robustpi4 [BZ 23183]
2017-01-26 16:22 ` Torvald Riegel
@ 2018-06-29 6:55 ` Stefan Liebler
2018-06-29 7:39 ` Florian Weimer
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2018-06-29 6:55 UTC (permalink / raw)
To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell, Torvald Riegel
On 01/26/2017 05:22 PM, Torvald Riegel wrote:
> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
>> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
>>> It seems as a race between futex- and exit-syscall causes ESRCH
>>> result from futex-syscall.
>
> I'll have a closer look at this.
>
>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
>>> well as with 4.6 on a LPAR (but less often).
>>
>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
>> across a wide number of kernels, but never tst-robustpi4.
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>>
>> The robustpi support is certainly not very robust as Torvald's
>> recent fixes show, and there still remains at least one design
>> flaw that can't be fixed.
>>
>> e.g.
>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>
> The underlying problem for that bug does not affect PI+robust, just
> robust, I think. Unless I forgot about something, PI+robust should
> always use the kernel to unlock.
>
>
>
Hi,
in the meantime, Florian Weimer could also reproduce this issue and
opened the bugzilla Bug 23183 - tst-robustpi4 test failure
(https://sourceware.org/bugzilla/show_bug.cgi?id=23183).
I've also dig a bit deeper - see details in bugzilla - and was also able
to reproduce it on intel.
If the thread with locked mutex is executing the exit-syscall
while the main-thread is executing the futex-syscall,
then it could lead to this ESRCH return value of the futex-syscall which
triggers the assertion.
In this situation, the futex-syscall has already added the FUTEX_WAITERS
bit to the lock-value and is then calling attach_to_pi_owner().
The exit-syscall is now setting the lock-value to FUTEX_WAITERS |
FUTEX_OWNER_DIED and is proceeding.
attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is
testing if the owner is currently exiting. In those cases, ESRCH is
returned!
Back in glibc, this assertion is triggered:
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
The assertion/comment does not agree with the current behaviour of the
kernel. Any ideas?
Bye
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAIL nptl/tst-robustpi4 [BZ 23183]
2018-06-29 6:55 ` FAIL nptl/tst-robustpi4 [BZ 23183] Stefan Liebler
@ 2018-06-29 7:39 ` Florian Weimer
2018-06-29 8:22 ` Stefan Liebler
0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2018-06-29 7:39 UTC (permalink / raw)
To: Stefan Liebler, libc-alpha; +Cc: Carlos O'Donell, Torvald Riegel
On 06/29/2018 08:54 AM, Stefan Liebler wrote:
> On 01/26/2017 05:22 PM, Torvald Riegel wrote:
>> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
>>> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
>>>> It seems as a race between futex- and exit-syscall causes ESRCH
>>>> result from futex-syscall.
>>
>> I'll have a closer look at this.
>>
>>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
>>>> well as with 4.6 on a LPAR (but less often).
>>>
>>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
>>> across a wide number of kernels, but never tst-robustpi4.
>>>
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>>>
>>> The robustpi support is certainly not very robust as Torvald's
>>> recent fixes show, and there still remains at least one design
>>> flaw that can't be fixed.
>>>
>>> e.g.
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>>
>> The underlying problem for that bug does not affect PI+robust, just
>> robust, I think. Unless I forgot about something, PI+robust should
>> always use the kernel to unlock.
> in the meantime, Florian Weimer could also reproduce this issue and
> opened the bugzilla Bug 23183 - tst-robustpi4 test failure
> (https://sourceware.org/bugzilla/show_bug.cgi?id=23183).
>
> I've also dig a bit deeper - see details in bugzilla - and was also able
> to reproduce it on intel.
>
> If the thread with locked mutex is executing the exit-syscall
> while the main-thread is executing the futex-syscall,
> then it could lead to this ESRCH return value of the futex-syscall which
> triggers the assertion.
>
> In this situation, the futex-syscall has already added the FUTEX_WAITERS
> bit to the lock-value and is then calling attach_to_pi_owner().
>
> The exit-syscall is now setting the lock-value to FUTEX_WAITERS |
> FUTEX_OWNER_DIED and is proceeding.
>
> attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is
> testing if the owner is currently exiting. In those cases, ESRCH is
> returned!
Does the kernel look at the TID and determine that it no longer exists,
or does it use the FUTEX_OWNER_DIED bit to detect this situation?
I'm worried that using the TID introduces a TID race here.
Thanks,
Florian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAIL nptl/tst-robustpi4 [BZ 23183]
2018-06-29 7:39 ` Florian Weimer
@ 2018-06-29 8:22 ` Stefan Liebler
0 siblings, 0 replies; 6+ messages in thread
From: Stefan Liebler @ 2018-06-29 8:22 UTC (permalink / raw)
To: Florian Weimer, libc-alpha; +Cc: Carlos O'Donell, Torvald Riegel
On 06/29/2018 09:39 AM, Florian Weimer wrote:
> On 06/29/2018 08:54 AM, Stefan Liebler wrote:
>> On 01/26/2017 05:22 PM, Torvald Riegel wrote:
>>> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
>>>> On 01/26/2017 10:29 AM, Stefan Liebler wrote:
>>>>> It seems as a race between futex- and exit-syscall causes ESRCH
>>>>> result from futex-syscall.
>>>
>>> I'll have a closer look at this.
>>>
>>>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
>>>>> well as with 4.6 on a LPAR (but less often).
>>>>
>>>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
>>>> across a wide number of kernels, but never tst-robustpi4.
>>>>
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004
>>>>
>>>> The robustpi support is certainly not very robust as Torvald's
>>>> recent fixes show, and there still remains at least one design
>>>> flaw that can't be fixed.
>>>>
>>>> e.g.
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>>>
>>> The underlying problem for that bug does not affect PI+robust, just
>>> robust, I think. Unless I forgot about something, PI+robust should
>>> always use the kernel to unlock.
>
>> in the meantime, Florian Weimer could also reproduce this issue and
>> opened the bugzilla Bug 23183 - tst-robustpi4 test failure
>> (https://sourceware.org/bugzilla/show_bug.cgi?id=23183).
>>
>> I've also dig a bit deeper - see details in bugzilla - and was also
>> able to reproduce it on intel.
>>
>> If the thread with locked mutex is executing the exit-syscall
>> while the main-thread is executing the futex-syscall,
>> then it could lead to this ESRCH return value of the futex-syscall
>> which triggers the assertion.
>>
>> In this situation, the futex-syscall has already added the
>> FUTEX_WAITERS bit to the lock-value and is then calling
>> attach_to_pi_owner().
>>
>> The exit-syscall is now setting the lock-value to FUTEX_WAITERS |
>> FUTEX_OWNER_DIED and is proceeding.
>>
>> attach_to_pi_owner() is now e.g. trying to get the owner-task and/or
>> is testing if the owner is currently exiting. In those cases, ESRCH is
>> returned!
>
> Does the kernel look at the TID and determine that it no longer exists,
> or does it use the FUTEX_OWNER_DIED bit to detect this situation?
>
> I'm worried that using the TID introduces a TID race here.
>
> Thanks,
> Florian
>
There can be different cases.
If the exit-syscall has already called handle_futex_death(). Then the
TID is zeroed and the lock value is FUTEX_WAITERS | FUTEX_OWNER_DIED.
Then attach_to_pi_onwer() fails with ESRCH due to:
pid_t pid = uval & FUTEX_TID_MASK;
struct task_struct *p;
if (!pid) return -ESRCH;
...
But even if the TID is available, it can fail due to:
p = find_get_task_by_vpid(pid);
if (!p) return -ESRCH;
If we've got the task_struct and if (p->flags & PF_EXITING) is true, it
depends on (p->flags & PF_EXITPIDONE). Then either ESRCH is returned
immediately. Or EAGAIN is returned which leads to calling
attach_to_pi_owner() again. After one/some rounds, it finally returns ESRCH.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-06-29 8:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 15:29 FAIL nptl/tst-robustpi4 Stefan Liebler
2017-01-26 16:12 ` Carlos O'Donell
2017-01-26 16:22 ` Torvald Riegel
2018-06-29 6:55 ` FAIL nptl/tst-robustpi4 [BZ 23183] Stefan Liebler
2018-06-29 7:39 ` Florian Weimer
2018-06-29 8:22 ` Stefan Liebler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).