* FAILING nptl/tst-robust8pi
@ 2024-01-18 14:58 Stefan Liebler
2024-01-19 10:32 ` Stefan Liebler
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2024-01-18 14:58 UTC (permalink / raw)
To: GNU C Library
[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]
Hi,
I've observed (on s390x) that nptl/tst-robust8pi sometimes fails (e.g.
if run it multiple times in a loop). I only see it with linux 6.7 (some
release-candidate, not the release) and linux 6.8 (thus I assume it will
also fail with the real 6.7 release).
In each case it fails with something like this:
mutex_timedlock of 66 in thread 7 failed with 22
=> pthread_mutex_timedlock returns 22=EINVAL
If I reboot with linux 6.6.9 and run the same binary again, I don't see
fails.
Does anybody else see such fails?
I've reduced the test (see attachement) and now have only one process
with three threads. I only use one mutex with attributes like the
original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED,
PTHREAD_PRIO_INHERIT.
Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0})
and if locked, pthread_mutex_unlock.
There is a hint in <glibc>/nptl/futex-internal.c:__futex_lock_pi64():
case -EINVAL:
/* This indicates either state corruption or that the kernel
found a waiter on futex address which is waiting via
FUTEX_WAIT or FUTEX_WAIT_BITSET. This is reported on
some futex_lock_pi usage (pthread_mutex_timedlock for
instance). */
I've added some uprobes before and after the futex-syscall in
__futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in
pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available,
but __ASSUME_TIME64_SYSCALLS is defined.
For me it looks like this (simplified ubprobes-trace):
<thread> <timestamp>: <probe>
t1 4309589.419744: before syscall in __futex_lock_pi64
t3 4309589.419745: before syscall in futex_unlock_pi
t2 4309589.419745: before syscall in __futex_lock_pi64
t3 4309589.419747: after syscall in futex_unlock_pi
t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL
t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT
Is this a kernel bug due to a recent change?
Or is this valid kernel behavior?
And should glibc handle it?
Bye,
Stefan
[-- Attachment #2: tst-robust8pi-20240118.c --]
[-- Type: text/x-csrc, Size: 4161 bytes --]
//CFLAGS=-pthread
//LDFLAGS=-lpthread
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>
#include <errno.h>
#include <unistd.h>
#define NUM_THREADS 3
#define THREAD_FUNC thr_func
#define USE_BARRIER 1
#ifndef ROUNDS
# define ROUNDS 100000000
#endif
typedef struct thr_info
{
int nr;
pthread_t thread;
} __attribute__ ((aligned (256))) thr_info_t;
#define THR_INIT() \
thr_info_t *thr = (thr_info_t *) arg;
#define THR_PRINTF(fmt, ...) \
printf ("#%d: " fmt, thr->nr, __VA_ARGS__)
#define THR_PUTS(msg) \
printf ("#%d: " msg "\n", thr->nr)
#if USE_BARRIER != 0
static pthread_barrier_t thrs_barrier;
#endif
static pthread_mutex_t mtx;
static const struct timespec before = { 0, 0 };
/* ###################################################################
thread func
############################################################### */
static void *
thr_func (void *arg)
{
THR_INIT ();
int state = 0;
int fct;
#if 0
/* 3 threads, 1xfct=0=pthread_mutex_lock, 2xfct=1=pthread_mutex_timedlock: EINVAL. */
fct = (thr->nr + 1) % 2;
#elif 0
/* 3 threads, 2xfct=0=pthread_mutex_lock, 1xfct=1=pthread_mutex_timedlock: no fails. */
fct = (thr->nr) % 2;
#elif 1
/* >3 threads, fct=1=only pthread_mutex_timedlock: EINVAL. */
fct = 1;
#endif
int round = 0;
THR_PRINTF ("started: fct=%d\n", fct);
#if USE_BARRIER != 0
pthread_barrier_wait (&thrs_barrier);
#endif
while (1)
{
if (state == 0)
{
round ++;
int e;
switch (fct)
{
case 0:
e = pthread_mutex_lock (&mtx);
if (e != 0)
{
THR_PRINTF ("mutex_lock failed with %d (round=%d)\n", e, round);
exit (1);
}
state = 1;
break;
case 1:
e = pthread_mutex_timedlock (&mtx, &before);
if (e != 0 && e != ETIMEDOUT)
{
THR_PRINTF ("mutex_timedlock failed with %d (round=%d)\n", e, round);
exit (1);
}
break;
default:
e = pthread_mutex_trylock (&mtx);
if (e != 0 && e != EBUSY)
{
THR_PRINTF ("mutex_trylock failed with %d (round=%d)\n", e, round);
exit (1);
}
break;
}
if (e == EOWNERDEAD)
pthread_mutex_consistent (&mtx);
if (e == 0 || e == EOWNERDEAD)
state = 1;
}
else
{
int e = pthread_mutex_unlock (&mtx);
if (e != 0)
{
THR_PRINTF ("mutex_unlock of failed with %d (round=%d)\n", e, round);
exit (1);
}
state = 0;
}
if (round >= ROUNDS)
{
THR_PRINTF ("REACHED round %d. => exit\n", ROUNDS);
if (state != 0)
{
int e = pthread_mutex_unlock (&mtx);
if (e != 0)
{
THR_PRINTF ("mutex_unlock@exit of failed with %d (round=%d)\n", e, round);
exit (1);
}
}
break;
}
}
return NULL;
}
int
main (void)
{
int i;
printf ("main: start %d threads.\n", NUM_THREADS);
#if USE_BARRIER != 0
pthread_barrier_init (&thrs_barrier, NULL, NUM_THREADS + 1);
#endif
pthread_mutexattr_t ma;
if (pthread_mutexattr_init (&ma) != 0)
{
puts ("mutexattr_init failed");
return 0;
}
if (pthread_mutexattr_setrobust (&ma, PTHREAD_MUTEX_ROBUST_NP) != 0)
{
puts ("mutexattr_setrobust failed");
return 1;
}
if (pthread_mutexattr_setpshared (&ma, PTHREAD_PROCESS_SHARED) != 0)
{
puts ("mutexattr_setpshared failed");
return 1;
}
if (pthread_mutexattr_setprotocol (&ma, PTHREAD_PRIO_INHERIT) != 0)
{
puts ("pthread_mutexattr_setprotocol failed");
return 1;
}
if (pthread_mutex_init (&mtx, &ma) != 0)
{
puts ("pthread_mutex_init failed");
return 1;
}
thr_info_t thrs[NUM_THREADS];
for (i = 0; i < NUM_THREADS; i++)
{
thrs[i].nr = i;
assert (pthread_create (&(thrs[i].thread), NULL, THREAD_FUNC, &(thrs[i]))
== 0);;
}
#if USE_BARRIER != 0
/* All threads start work after this barrier. */
pthread_barrier_wait (&thrs_barrier);
#endif
for (i = 0; i < NUM_THREADS; i++)
{
pthread_join (thrs[i].thread, NULL);
}
#if USE_BARRIER != 0
pthread_barrier_destroy (&thrs_barrier);
#endif
printf ("main: end.\n");
return EXIT_SUCCESS;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAILING nptl/tst-robust8pi
2024-01-18 14:58 FAILING nptl/tst-robust8pi Stefan Liebler
@ 2024-01-19 10:32 ` Stefan Liebler
2024-01-19 12:26 ` Florian Weimer
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2024-01-19 10:32 UTC (permalink / raw)
To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell
On 18.01.24 15:58, Stefan Liebler wrote:
> Hi,
>
> I've observed (on s390x) that nptl/tst-robust8pi sometimes fails (e.g.
> if run it multiple times in a loop). I only see it with linux 6.7 (some
> release-candidate, not the release) and linux 6.8 (thus I assume it will
> also fail with the real 6.7 release).
>
> In each case it fails with something like this:
> mutex_timedlock of 66 in thread 7 failed with 22
> => pthread_mutex_timedlock returns 22=EINVAL
>
> If I reboot with linux 6.6.9 and run the same binary again, I don't see
> fails.
>
> Does anybody else see such fails?
>
> I've reduced the test (see attachement) and now have only one process
> with three threads. I only use one mutex with attributes like the
> original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED,
> PTHREAD_PRIO_INHERIT.
> Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0})
> and if locked, pthread_mutex_unlock.
>
> There is a hint in <glibc>/nptl/futex-internal.c:__futex_lock_pi64():
> case -EINVAL:
> /* This indicates either state corruption or that the kernel
> found a waiter on futex address which is waiting via
> FUTEX_WAIT or FUTEX_WAIT_BITSET. This is reported on
> some futex_lock_pi usage (pthread_mutex_timedlock for
> instance). */
>
>
> I've added some uprobes before and after the futex-syscall in
> __futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in
> pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available,
> but __ASSUME_TIME64_SYSCALLS is defined.
>
> For me it looks like this (simplified ubprobes-trace):
> <thread> <timestamp>: <probe>
> t1 4309589.419744: before syscall in __futex_lock_pi64
>
> t3 4309589.419745: before syscall in futex_unlock_pi
> t2 4309589.419745: before syscall in __futex_lock_pi64
>
> t3 4309589.419747: after syscall in futex_unlock_pi
> t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL
>
> t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT
>
>
> Is this a kernel bug due to a recent change?
> Or is this valid kernel behavior?
> And should glibc handle it?
>
> Bye,
> Stefan
In the meanwhile I also retested with glibc 2.38 release tag to be sure
there was no glibc change in between. => same test error.
I also created a kvm-guest on my x86_64 machine with fedora 39 and used
this kernel from copr-repository:
Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux
=> Now I also see the EINVAL on x86_64
@Florian / @Carlos: Have you already saw this on fedora side?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAILING nptl/tst-robust8pi
2024-01-19 10:32 ` Stefan Liebler
@ 2024-01-19 12:26 ` Florian Weimer
2024-01-19 14:16 ` Stefan Liebler
0 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2024-01-19 12:26 UTC (permalink / raw)
To: Stefan Liebler; +Cc: libc-alpha, Carlos O'Donell
* Stefan Liebler:
> In the meanwhile I also retested with glibc 2.38 release tag to be sure
> there was no glibc change in between. => same test error.
>
> I also created a kvm-guest on my x86_64 machine with fedora 39 and used
> this kernel from copr-repository:
> Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
> SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux
> => Now I also see the EINVAL on x86_64
>
> @Florian / @Carlos: Have you already saw this on fedora side?
There has been a refactor of the futex code in the kernel. The initial
version was buggy:
Re: Several tst-robust* tests time out with recent Linux kernel
<https://lore.kernel.org/all/20231114201402.GA25315@noisy.programming.kicks-ass.net/>
Maybe there are more issues remaining?
Thanks,
Florian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAILING nptl/tst-robust8pi
2024-01-19 12:26 ` Florian Weimer
@ 2024-01-19 14:16 ` Stefan Liebler
2024-01-19 15:48 ` Xi Ruoyao
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Liebler @ 2024-01-19 14:16 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha, Carlos O'Donell
On 19.01.24 13:26, Florian Weimer wrote:
> * Stefan Liebler:
>
>> In the meanwhile I also retested with glibc 2.38 release tag to be sure
>> there was no glibc change in between. => same test error.
>>
>> I also created a kvm-guest on my x86_64 machine with fedora 39 and used
>> this kernel from copr-repository:
>> Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
>> SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux
>> => Now I also see the EINVAL on x86_64
>>
>> @Florian / @Carlos: Have you already saw this on fedora side?
>
> There has been a refactor of the futex code in the kernel. The initial
> version was buggy:
>
> Re: Several tst-robust* tests time out with recent Linux kernel
> <https://lore.kernel.org/all/20231114201402.GA25315@noisy.programming.kicks-ass.net/>
>
> Maybe there are more issues remaining?
>
> Thanks,
> Florian
>
Thanks for the hint.
Yes, this thread contains my observed issue. I've answered there:
"Re: Several tst-robust* tests time out with recent Linux kernel"
link on glibc list:
https://sourceware.org/pipermail/libc-alpha/2024-January/154170.html
link on kernel list:
https://lore.kernel.org/all/fc3fd07a-218d-406c-918b-e7f701968eb0@linux.ibm.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAILING nptl/tst-robust8pi
2024-01-19 14:16 ` Stefan Liebler
@ 2024-01-19 15:48 ` Xi Ruoyao
2024-01-21 19:45 ` Xi Ruoyao
0 siblings, 1 reply; 6+ messages in thread
From: Xi Ruoyao @ 2024-01-19 15:48 UTC (permalink / raw)
To: Stefan Liebler, Florian Weimer; +Cc: libc-alpha, Carlos O'Donell
On Fri, 2024-01-19 at 15:16 +0100, Stefan Liebler wrote:
> On 19.01.24 13:26, Florian Weimer wrote:
> > * Stefan Liebler:
> >
> > > In the meanwhile I also retested with glibc 2.38 release tag to be sure
> > > there was no glibc change in between. => same test error.
> > >
> > > I also created a kvm-guest on my x86_64 machine with fedora 39 and used
> > > this kernel from copr-repository:
> > > Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
> > > SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux
> > > => Now I also see the EINVAL on x86_64
> > >
> > > @Florian / @Carlos: Have you already saw this on fedora side?
> >
> > There has been a refactor of the futex code in the kernel. The initial
> > version was buggy:
> >
> > Re: Several tst-robust* tests time out with recent Linux kernel
> > <https://lore.kernel.org/all/20231114201402.GA25315@noisy.programming.kicks-ass.net/>
> >
> > Maybe there are more issues remaining?
> >
> > Thanks,
> > Florian
> >
> Thanks for the hint.
> Yes, this thread contains my observed issue. I've answered there:
> "Re: Several tst-robust* tests time out with recent Linux kernel"
> link on glibc list:
> https://sourceware.org/pipermail/libc-alpha/2024-January/154170.html
> link on kernel list:
> https://lore.kernel.org/all/fc3fd07a-218d-406c-918b-e7f701968eb0@linux.ibm.com/
As the reporter of the linked kernel bug: I don't think this is the same
issue you are encountering. The tst-robust* timeouts I found were
highly deterministic, i.e. they *always* failed, no "sometimes". And
IIRC the failures I observed were deadlock-alike, i.e. the futex syscall
just hangs indefinitely w/o returning EINVAL or something.
As a Glibc machine tester: I've tested Glibc about a dozen times
recently on several different architectures and I've not seen any tst-
robust8pi failure.
As a Linux From Scratch maintainer: there had been some complains about
additional Glibc failures from people building LFS in virtual machines
for a long time. I cannot recall if the VMM they used was QEMU/KVM
though.
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: FAILING nptl/tst-robust8pi
2024-01-19 15:48 ` Xi Ruoyao
@ 2024-01-21 19:45 ` Xi Ruoyao
0 siblings, 0 replies; 6+ messages in thread
From: Xi Ruoyao @ 2024-01-21 19:45 UTC (permalink / raw)
To: Stefan Liebler, Florian Weimer; +Cc: libc-alpha, Carlos O'Donell
On Fri, 2024-01-19 at 23:48 +0800, Xi Ruoyao wrote:
> On Fri, 2024-01-19 at 15:16 +0100, Stefan Liebler wrote:
> > On 19.01.24 13:26, Florian Weimer wrote:
> > > * Stefan Liebler:
> > >
> > > > In the meanwhile I also retested with glibc 2.38 release tag to be sure
> > > > there was no glibc change in between. => same test error.
> > > >
> > > > I also created a kvm-guest on my x86_64 machine with fedora 39 and used
> > > > this kernel from copr-repository:
> > > > Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1
> > > > SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux
> > > > => Now I also see the EINVAL on x86_64
> > > >
> > > > @Florian / @Carlos: Have you already saw this on fedora side?
> > >
> > > There has been a refactor of the futex code in the kernel. The initial
> > > version was buggy:
> > >
> > > Re: Several tst-robust* tests time out with recent Linux kernel
> > > <https://lore.kernel.org/all/20231114201402.GA25315@noisy.programming.kicks-ass.net/>
> > >
> > > Maybe there are more issues remaining?
> > >
> > > Thanks,
> > > Florian
> > >
> > Thanks for the hint.
> > Yes, this thread contains my observed issue. I've answered there:
> > "Re: Several tst-robust* tests time out with recent Linux kernel"
> > link on glibc list:
> > https://sourceware.org/pipermail/libc-alpha/2024-January/154170.html
> > link on kernel list:
> > https://lore.kernel.org/all/fc3fd07a-218d-406c-918b-e7f701968eb0@linux.ibm.com/
>
> As the reporter of the linked kernel bug: I don't think this is the same
> issue you are encountering. The tst-robust* timeouts I found were
> highly deterministic, i.e. they *always* failed, no "sometimes". And
> IIRC the failures I observed were deadlock-alike, i.e. the futex syscall
> just hangs indefinitely w/o returning EINVAL or something.
EINVAL may be this one:
https://lore.kernel.org/all/20240116130810.ji1YCxpg@linutronix.de/
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-21 19:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-18 14:58 FAILING nptl/tst-robust8pi Stefan Liebler
2024-01-19 10:32 ` Stefan Liebler
2024-01-19 12:26 ` Florian Weimer
2024-01-19 14:16 ` Stefan Liebler
2024-01-19 15:48 ` Xi Ruoyao
2024-01-21 19:45 ` Xi Ruoyao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).