* Several tst-robust* tests time out with recent Linux kernel @ 2023-11-13 18:33 Xi Ruoyao 2023-11-14 9:46 ` Xi Ruoyao 0 siblings, 1 reply; 14+ messages in thread From: Xi Ruoyao @ 2023-11-13 18:33 UTC (permalink / raw) To: libc-alpha Hi, With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: FAIL: nptl/tst-robust1 FAIL: nptl/tst-robust3 FAIL: nptl/tst-robust4 FAIL: nptl/tst-robust6 FAIL: nptl/tst-robust7 FAIL: nptl/tst-robust9 This does not happen with Linux 6.6.0. Do you have some clue about it? -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-13 18:33 Several tst-robust* tests time out with recent Linux kernel Xi Ruoyao @ 2023-11-14 9:46 ` Xi Ruoyao 2023-11-14 15:31 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Xi Ruoyao @ 2023-11-14 9:46 UTC (permalink / raw) To: Peter Zijlstra (Intel), libc-alpha Cc: linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, 2023-11-14 at 02:33 +0800, Xi Ruoyao wrote: > Hi, > > With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: > > FAIL: nptl/tst-robust1 > FAIL: nptl/tst-robust3 > FAIL: nptl/tst-robust4 > FAIL: nptl/tst-robust6 > FAIL: nptl/tst-robust7 > FAIL: nptl/tst-robust9 > > This does not happen with Linux 6.6.0. Do you have some clue about > it? Bisected to the kernel commit: commit 5694289ce183bc3336407a78c8c722a0b9208f9b (HEAD) Author: peterz@infradead.org <peterz@infradead.org> Date: Thu Sep 21 12:45:08 2023 +0200 futex: Flag conversion Futex has 3 sets of flags: - legacy futex op bits - futex2 flags - internal flags Add a few helpers to convert from the API flags into the internal flags. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andr<C3><A9> Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 9:46 ` Xi Ruoyao @ 2023-11-14 15:31 ` Peter Zijlstra 2023-11-14 15:40 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-14 15:31 UTC (permalink / raw) To: Xi Ruoyao Cc: libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, Nov 14, 2023 at 05:46:43PM +0800, Xi Ruoyao wrote: > On Tue, 2023-11-14 at 02:33 +0800, Xi Ruoyao wrote: > > Hi, > > > > With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: > > > > FAIL: nptl/tst-robust1 > > FAIL: nptl/tst-robust3 > > FAIL: nptl/tst-robust4 > > FAIL: nptl/tst-robust6 > > FAIL: nptl/tst-robust7 > > FAIL: nptl/tst-robust9 > > > > This does not happen with Linux 6.6.0. Do you have some clue about > > it? > > Bisected to the kernel commit: > > commit 5694289ce183bc3336407a78c8c722a0b9208f9b (HEAD) > Author: peterz@infradead.org <peterz@infradead.org> > Date: Thu Sep 21 12:45:08 2023 +0200 > > futex: Flag conversion > > Futex has 3 sets of flags: > > - legacy futex op bits > - futex2 flags > - internal flags > > Add a few helpers to convert from the API flags into the internal > flags. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > Reviewed-by: Andr<C3><A9> Almeida <andrealmeid@igalia.com> > Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net I can confirm. I'm also going crazy trying to figure out how this happens. The below is sufficient to make it unhappy... /me most puzzled --- diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index b5379c0e6d6d..1a1f9301251f 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -17,7 +17,7 @@ * restarts. */ #ifdef CONFIG_MMU -# define FLAGS_SHARED 0x01 +# define FLAGS_SHARED 0x10 #else /* * NOMMU does not have per process address space. Let the compiler optimize @@ -25,8 +25,8 @@ */ # define FLAGS_SHARED 0x00 #endif -#define FLAGS_CLOCKRT 0x02 -#define FLAGS_HAS_TIMEOUT 0x04 +#define FLAGS_CLOCKRT 0x20 +#define FLAGS_HAS_TIMEOUT 0x40 #ifdef CONFIG_FAIL_FUTEX extern bool should_fail_futex(bool fshared); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 15:31 ` Peter Zijlstra @ 2023-11-14 15:40 ` Peter Zijlstra 2023-11-14 16:43 ` Florian Weimer 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-14 15:40 UTC (permalink / raw) To: Xi Ruoyao Cc: libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, Nov 14, 2023 at 04:31:00PM +0100, Peter Zijlstra wrote: > On Tue, Nov 14, 2023 at 05:46:43PM +0800, Xi Ruoyao wrote: > > On Tue, 2023-11-14 at 02:33 +0800, Xi Ruoyao wrote: > > > Hi, > > > > > > With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: > > > > > > FAIL: nptl/tst-robust1 > > > FAIL: nptl/tst-robust3 > > > FAIL: nptl/tst-robust4 > > > FAIL: nptl/tst-robust6 > > > FAIL: nptl/tst-robust7 > > > FAIL: nptl/tst-robust9 > > > > > > This does not happen with Linux 6.6.0. Do you have some clue about > > > it? > > > > Bisected to the kernel commit: > > > > commit 5694289ce183bc3336407a78c8c722a0b9208f9b (HEAD) > > Author: peterz@infradead.org <peterz@infradead.org> > > Date: Thu Sep 21 12:45:08 2023 +0200 > > > > futex: Flag conversion > > > > Futex has 3 sets of flags: > > > > - legacy futex op bits > > - futex2 flags > > - internal flags > > > > Add a few helpers to convert from the API flags into the internal > > flags. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > Reviewed-by: Andr<C3><A9> Almeida <andrealmeid@igalia.com> > > Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net > > I can confirm. I'm also going crazy trying to figure out how this > happens. > > The below is sufficient to make it unhappy... > > /me most puzzled > > --- > diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h > index b5379c0e6d6d..1a1f9301251f 100644 > --- a/kernel/futex/futex.h > +++ b/kernel/futex/futex.h > @@ -17,7 +17,7 @@ > * restarts. > */ > #ifdef CONFIG_MMU > -# define FLAGS_SHARED 0x01 > +# define FLAGS_SHARED 0x10 > #else > /* > * NOMMU does not have per process address space. Let the compiler optimize Just the above seems sufficient. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 15:40 ` Peter Zijlstra @ 2023-11-14 16:43 ` Florian Weimer 2023-11-14 20:14 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Florian Weimer @ 2023-11-14 16:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Xi Ruoyao, libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida * Peter Zijlstra: >> diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h >> index b5379c0e6d6d..1a1f9301251f 100644 >> --- a/kernel/futex/futex.h >> +++ b/kernel/futex/futex.h >> @@ -17,7 +17,7 @@ >> * restarts. >> */ >> #ifdef CONFIG_MMU >> -# define FLAGS_SHARED 0x01 >> +# define FLAGS_SHARED 0x10 >> #else >> /* >> * NOMMU does not have per process address space. Let the compiler optimize > > Just the above seems sufficient. There are a few futex_wake calls which hard-code the flags argument as 1: kernel/futex/core.c=637=static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, -- kernel/futex/core.c-686- * this. kernel/futex/core.c-687- */ kernel/futex/core.c-688- owner = uval & FUTEX_TID_MASK; kernel/futex/core.c-689- kernel/futex/core.c-690- if (pending_op && !pi && !owner) { kernel/futex/core.c:691: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); kernel/futex/core.c-692- return 0; kernel/futex/core.c-693- } kernel/futex/core.c-694- kernel/futex/core.c-695- if (owner != task_pid_vnr(curr)) kernel/futex/core.c-696- return 0; -- kernel/futex/core.c-739- /* kernel/futex/core.c-740- * Wake robust non-PI futexes here. The wakeup of kernel/futex/core.c-741- * PI futexes happens in exit_pi_state(): kernel/futex/core.c-742- */ kernel/futex/core.c-743- if (!pi && (uval & FUTEX_WAITERS)) kernel/futex/core.c:744: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); kernel/futex/core.c-745- kernel/futex/core.c-746- return 0; kernel/futex/core.c-747-} kernel/futex/core.c-748- kernel/futex/core.c-749-/* Thanks, Florian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 16:43 ` Florian Weimer @ 2023-11-14 20:14 ` Peter Zijlstra 2023-11-15 1:11 ` Edgecombe, Rick P 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-14 20:14 UTC (permalink / raw) To: Florian Weimer Cc: Xi Ruoyao, libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, Nov 14, 2023 at 05:43:20PM +0100, Florian Weimer wrote: > * Peter Zijlstra: > > >> diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h > >> index b5379c0e6d6d..1a1f9301251f 100644 > >> --- a/kernel/futex/futex.h > >> +++ b/kernel/futex/futex.h > >> @@ -17,7 +17,7 @@ > >> * restarts. > >> */ > >> #ifdef CONFIG_MMU > >> -# define FLAGS_SHARED 0x01 > >> +# define FLAGS_SHARED 0x10 > >> #else > >> /* > >> * NOMMU does not have per process address space. Let the compiler optimize > > > > Just the above seems sufficient. > > There are a few futex_wake calls which hard-code the flags argument as > 1: > > kernel/futex/core.c=637=static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, > -- > kernel/futex/core.c-686- * this. > kernel/futex/core.c-687- */ > kernel/futex/core.c-688- owner = uval & FUTEX_TID_MASK; > kernel/futex/core.c-689- > kernel/futex/core.c-690- if (pending_op && !pi && !owner) { > kernel/futex/core.c:691: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); > kernel/futex/core.c-692- return 0; > kernel/futex/core.c-693- } > kernel/futex/core.c-694- > kernel/futex/core.c-695- if (owner != task_pid_vnr(curr)) > kernel/futex/core.c-696- return 0; > -- > kernel/futex/core.c-739- /* > kernel/futex/core.c-740- * Wake robust non-PI futexes here. The wakeup of > kernel/futex/core.c-741- * PI futexes happens in exit_pi_state(): > kernel/futex/core.c-742- */ > kernel/futex/core.c-743- if (!pi && (uval & FUTEX_WAITERS)) > kernel/futex/core.c:744: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); > kernel/futex/core.c-745- > kernel/futex/core.c-746- return 0; > kernel/futex/core.c-747-} > kernel/futex/core.c-748- > kernel/futex/core.c-749-/* Urgh, thanks! Confirmed, the below cures things. Although I should probably make that FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. Let me go do a proper patch. --- kernel/futex/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index d1d7b3c175a4..e7793f0d5757 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -687,7 +687,7 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, owner = uval & FUTEX_TID_MASK; if (pending_op && !pi && !owner) { - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + futex_wake(uaddr, FLAGS_SHARED, 1, FUTEX_BITSET_MATCH_ANY); return 0; } @@ -740,7 +740,7 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, * PI futexes happens in exit_pi_state(): */ if (!pi && (uval & FUTEX_WAITERS)) - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + futex_wake(uaddr, FLAGS_SHARED, 1, FUTEX_BITSET_MATCH_ANY); return 0; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 20:14 ` Peter Zijlstra @ 2023-11-15 1:11 ` Edgecombe, Rick P 2023-11-15 8:51 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2023-11-15 1:11 UTC (permalink / raw) To: peterz, fweimer Cc: xry111, andrealmeid, linux-api, linux-mm, libc-alpha, linux-kernel, tglx, linux-arch On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote: > Urgh, thanks! > > Confirmed, the below cures things. Although I should probably make > that > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. > > Let me go do a proper patch. I saw these fail on the glibc shadow stack branch today, and I also saw this one failing: FAIL: nptl/tst-robustpi8 It spit out: mutex_timedlock of 41 in thread 1 failed with 22 child did not die of a signal in round 1 After the fix here I saw the others pass, but still not tst-robustpi8. Not sure if it is some shadow stack complication. I can try to dig in tomorrow if the problem doesn't jump out. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-15 1:11 ` Edgecombe, Rick P @ 2023-11-15 8:51 ` Peter Zijlstra 2023-11-15 23:28 ` Edgecombe, Rick P 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-15 8:51 UTC (permalink / raw) To: Edgecombe, Rick P Cc: fweimer, xry111, andrealmeid, linux-api, linux-mm, libc-alpha, linux-kernel, tglx, linux-arch On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote: > On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote: > > Urgh, thanks! > > > > Confirmed, the below cures things. Although I should probably make > > that > > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. > > > > Let me go do a proper patch. > > I saw these fail on the glibc shadow stack branch today, and I also saw > this one failing: > FAIL: nptl/tst-robustpi8 tip/locking/urgent (branch with the fix on) gets me: root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8 running child verifying locks running child verifying locks running child verifying locks running child verifying locks running child verifying locks root@ivb-ep:/usr/local/src/glibc# Which, to my untrained eye, looks like a pass to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-15 8:51 ` Peter Zijlstra @ 2023-11-15 23:28 ` Edgecombe, Rick P 2023-11-17 1:22 ` Edgecombe, Rick P 0 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2023-11-15 23:28 UTC (permalink / raw) To: peterz Cc: xry111, andrealmeid, fweimer, linux-mm, libc-alpha, linux-kernel, tglx, linux-api, linux-arch On Wed, 2023-11-15 at 09:51 +0100, Peter Zijlstra wrote: > On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote: > > On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote: > > > Urgh, thanks! > > > > > > Confirmed, the below cures things. Although I should probably > > > make > > > that > > > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. > > > > > > Let me go do a proper patch. > > > > I saw these fail on the glibc shadow stack branch today, and I also > > saw > > this one failing: > > FAIL: nptl/tst-robustpi8 > > tip/locking/urgent (branch with the fix on) gets me: > > root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8 > running child > verifying locks > running child > verifying locks > running child > verifying locks > running child > verifying locks > running child > verifying locks > root@ivb-ep:/usr/local/src/glibc# > > Which, to my untrained eye, looks like a pass to me. It bisects to this for me: fbeb558b0dd0 ("futex/pi: Fix recursive rt_mutex waiter state") Reading the patch, I'm not immediately clear what is going on but a few comments stood out: "There be dragons here" "What could possibly go wrong..." "This is a somewhat dangerous proposition". Seems a likelihood of some race, but it reproduces reliably on my machine. Haven't dug into debugging it yet. Any pointers? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-15 23:28 ` Edgecombe, Rick P @ 2023-11-17 1:22 ` Edgecombe, Rick P 2024-01-19 13:56 ` Stefan Liebler 0 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2023-11-17 1:22 UTC (permalink / raw) To: peterz Cc: xry111, andrealmeid, fweimer, linux-mm, libc-alpha, linux-kernel, tglx, linux-api, linux-arch A bit more info... The error returned to userspace is originating from: https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295 'uval' is often zero in that error case, but sometimes just a mismatching value like: uval=0x567, task_pid_vnr()=0x564 Depending on the number of CPUs the VM is running on it reproduces or not. When it does reproduce, the newly added path here is taken: https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185 The path is taken a lot during the test, sometimes >400 times before the above linked error is generated during the syscall. When it doesn't reproduce, I never saw that new path taken. More print statements make the reproduction less reliable, so it does seem to have a race in the mix at least somewhat. Otherwise, I haven't tried to understand what is going on here with all this highwire locking. Hope it helps. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-17 1:22 ` Edgecombe, Rick P @ 2024-01-19 13:56 ` Stefan Liebler 2024-01-22 14:34 ` Stefan Liebler 2024-01-29 22:23 ` Edgecombe, Rick P 0 siblings, 2 replies; 14+ messages in thread From: Stefan Liebler @ 2024-01-19 13:56 UTC (permalink / raw) To: Edgecombe, Rick P, peterz Cc: xry111, andrealmeid, fweimer, linux-mm, libc-alpha, linux-kernel, tglx, linux-api, linux-arch, Heiko Carstens, Sven Schnelle [-- Attachment #1: Type: text/plain, Size: 2794 bytes --] On 17.11.23 02:22, Edgecombe, Rick P wrote: > A bit more info... > > The error returned to userspace is originating from: > https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295 > > 'uval' is often zero in that error case, but sometimes just a > mismatching value like: uval=0x567, task_pid_vnr()=0x564 > > > Depending on the number of CPUs the VM is running on it reproduces or > not. When it does reproduce, the newly added path here is taken: > https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185 > The path is taken a lot during the test, sometimes >400 times before > the above linked error is generated during the syscall. When it doesn't > reproduce, I never saw that new path taken. > > More print statements make the reproduction less reliable, so it does > seem to have a race in the mix at least somewhat. Otherwise, I haven't > tried to understand what is going on here with all this highwire > locking. > > Hope it helps. Hi, I've also observed fails in glibc testcase nptl/tst-robust8pi with: mutex_timedlock of 66 in thread 7 failed with 22 => pthread_mutex_timedlock returns 22=EINVAL I've saw it on s390x. There I've used kernel with commit 120d99901eb288f1d21db3976df4ba347b28f9c7 s390/vfio-ap: do not reset queue removed from host config But I also saw it on a x86_64 kvm-guest with Fedora 39 and copr-repository with vanilla kernel: Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux And reported it to libc-alpha ("FAILING nptl/tst-robust8pi" https://sourceware.org/pipermail/libc-alpha/2024-January/154150.html) where Florian Weimer pointed me to this thread. I've reduced the test (see attachement) and now have only one process with three threads. I only use one mutex with attributes like the original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED, PTHREAD_PRIO_INHERIT. Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0}) and if locked, pthread_mutex_unlock. I've added some uprobes before and after the futex-syscall in __futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available, but __ASSUME_TIME64_SYSCALLS is defined. For me it looks like this (simplified ubprobes-trace): <thread> <timestamp>: <probe> t1 4309589.419744: before syscall in __futex_lock_pi64 t3 4309589.419745: before syscall in futex_unlock_pi t2 4309589.419745: before syscall in __futex_lock_pi64 t3 4309589.419747: after syscall in futex_unlock_pi t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT Can you please have a look again? Bye, Stefan Liebler [-- Attachment #2: tst-robust8pi-20240118.c --] [-- Type: text/x-csrc, Size: 4161 bytes --] //CFLAGS=-pthread //LDFLAGS=-lpthread #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <assert.h> #include <errno.h> #include <unistd.h> #define NUM_THREADS 3 #define THREAD_FUNC thr_func #define USE_BARRIER 1 #ifndef ROUNDS # define ROUNDS 100000000 #endif typedef struct thr_info { int nr; pthread_t thread; } __attribute__ ((aligned (256))) thr_info_t; #define THR_INIT() \ thr_info_t *thr = (thr_info_t *) arg; #define THR_PRINTF(fmt, ...) \ printf ("#%d: " fmt, thr->nr, __VA_ARGS__) #define THR_PUTS(msg) \ printf ("#%d: " msg "\n", thr->nr) #if USE_BARRIER != 0 static pthread_barrier_t thrs_barrier; #endif static pthread_mutex_t mtx; static const struct timespec before = { 0, 0 }; /* ################################################################### thread func ############################################################### */ static void * thr_func (void *arg) { THR_INIT (); int state = 0; int fct; #if 0 /* 3 threads, 1xfct=0=pthread_mutex_lock, 2xfct=1=pthread_mutex_timedlock: EINVAL. */ fct = (thr->nr + 1) % 2; #elif 0 /* 3 threads, 2xfct=0=pthread_mutex_lock, 1xfct=1=pthread_mutex_timedlock: no fails. */ fct = (thr->nr) % 2; #elif 1 /* >3 threads, fct=1=only pthread_mutex_timedlock: EINVAL. */ fct = 1; #endif int round = 0; THR_PRINTF ("started: fct=%d\n", fct); #if USE_BARRIER != 0 pthread_barrier_wait (&thrs_barrier); #endif while (1) { if (state == 0) { round ++; int e; switch (fct) { case 0: e = pthread_mutex_lock (&mtx); if (e != 0) { THR_PRINTF ("mutex_lock failed with %d (round=%d)\n", e, round); exit (1); } state = 1; break; case 1: e = pthread_mutex_timedlock (&mtx, &before); if (e != 0 && e != ETIMEDOUT) { THR_PRINTF ("mutex_timedlock failed with %d (round=%d)\n", e, round); exit (1); } break; default: e = pthread_mutex_trylock (&mtx); if (e != 0 && e != EBUSY) { THR_PRINTF ("mutex_trylock failed with %d (round=%d)\n", e, round); exit (1); } break; } if (e == EOWNERDEAD) pthread_mutex_consistent (&mtx); if (e == 0 || e == EOWNERDEAD) state = 1; } else { int e = pthread_mutex_unlock (&mtx); if (e != 0) { THR_PRINTF ("mutex_unlock of failed with %d (round=%d)\n", e, round); exit (1); } state = 0; } if (round >= ROUNDS) { THR_PRINTF ("REACHED round %d. => exit\n", ROUNDS); if (state != 0) { int e = pthread_mutex_unlock (&mtx); if (e != 0) { THR_PRINTF ("mutex_unlock@exit of failed with %d (round=%d)\n", e, round); exit (1); } } break; } } return NULL; } int main (void) { int i; printf ("main: start %d threads.\n", NUM_THREADS); #if USE_BARRIER != 0 pthread_barrier_init (&thrs_barrier, NULL, NUM_THREADS + 1); #endif pthread_mutexattr_t ma; if (pthread_mutexattr_init (&ma) != 0) { puts ("mutexattr_init failed"); return 0; } if (pthread_mutexattr_setrobust (&ma, PTHREAD_MUTEX_ROBUST_NP) != 0) { puts ("mutexattr_setrobust failed"); return 1; } if (pthread_mutexattr_setpshared (&ma, PTHREAD_PROCESS_SHARED) != 0) { puts ("mutexattr_setpshared failed"); return 1; } if (pthread_mutexattr_setprotocol (&ma, PTHREAD_PRIO_INHERIT) != 0) { puts ("pthread_mutexattr_setprotocol failed"); return 1; } if (pthread_mutex_init (&mtx, &ma) != 0) { puts ("pthread_mutex_init failed"); return 1; } thr_info_t thrs[NUM_THREADS]; for (i = 0; i < NUM_THREADS; i++) { thrs[i].nr = i; assert (pthread_create (&(thrs[i].thread), NULL, THREAD_FUNC, &(thrs[i])) == 0);; } #if USE_BARRIER != 0 /* All threads start work after this barrier. */ pthread_barrier_wait (&thrs_barrier); #endif for (i = 0; i < NUM_THREADS; i++) { pthread_join (thrs[i].thread, NULL); } #if USE_BARRIER != 0 pthread_barrier_destroy (&thrs_barrier); #endif printf ("main: end.\n"); return EXIT_SUCCESS; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2024-01-19 13:56 ` Stefan Liebler @ 2024-01-22 14:34 ` Stefan Liebler 2024-01-29 22:23 ` Edgecombe, Rick P 1 sibling, 0 replies; 14+ messages in thread From: Stefan Liebler @ 2024-01-22 14:34 UTC (permalink / raw) To: libc-alpha On 19.01.24 14:56, Stefan Liebler wrote: > On 17.11.23 02:22, Edgecombe, Rick P wrote: >> A bit more info... >> >> The error returned to userspace is originating from: >> https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295 >> >> 'uval' is often zero in that error case, but sometimes just a >> mismatching value like: uval=0x567, task_pid_vnr()=0x564 >> >> >> Depending on the number of CPUs the VM is running on it reproduces or >> not. When it does reproduce, the newly added path here is taken: >> https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185 >> The path is taken a lot during the test, sometimes >400 times before >> the above linked error is generated during the syscall. When it doesn't >> reproduce, I never saw that new path taken. >> >> More print statements make the reproduction less reliable, so it does >> seem to have a race in the mix at least somewhat. Otherwise, I haven't >> tried to understand what is going on here with all this highwire >> locking. >> >> Hope it helps. > Hi, > > I've also observed fails in glibc testcase nptl/tst-robust8pi with: > mutex_timedlock of 66 in thread 7 failed with 22 > => pthread_mutex_timedlock returns 22=EINVAL > > I've saw it on s390x. There I've used kernel with > commit 120d99901eb288f1d21db3976df4ba347b28f9c7 > s390/vfio-ap: do not reset queue removed from host config > > But I also saw it on a x86_64 kvm-guest with Fedora 39 and > copr-repository with vanilla kernel: > Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1 > SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux > > And reported it to libc-alpha ("FAILING nptl/tst-robust8pi" > https://sourceware.org/pipermail/libc-alpha/2024-January/154150.html) > where Florian Weimer pointed me to this thread. > > I've reduced the test (see attachement) and now have only one process > with three threads. I only use one mutex with attributes like the > original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED, > PTHREAD_PRIO_INHERIT. > Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0}) > and if locked, pthread_mutex_unlock. > > I've added some uprobes before and after the futex-syscall in > __futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in > pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available, > but __ASSUME_TIME64_SYSCALLS is defined. > > For me it looks like this (simplified ubprobes-trace): > <thread> <timestamp>: <probe> > t1 4309589.419744: before syscall in __futex_lock_pi64 > > t3 4309589.419745: before syscall in futex_unlock_pi > t2 4309589.419745: before syscall in __futex_lock_pi64 > > t3 4309589.419747: after syscall in futex_unlock_pi > t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL > > t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT > > Can you please have a look again? > > Bye, > Stefan Liebler FYI kernel commit "futex: Prevent the reuse of stale pi_state" https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=e626cb02ee8399fd42c415e542d031d185783903 fixes my issue on s390x. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2024-01-19 13:56 ` Stefan Liebler 2024-01-22 14:34 ` Stefan Liebler @ 2024-01-29 22:23 ` Edgecombe, Rick P 2024-01-30 10:12 ` Stefan Liebler 1 sibling, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2024-01-29 22:23 UTC (permalink / raw) To: peterz, stli Cc: xry111, hca, andrealmeid, fweimer, linux-mm, libc-alpha, linux-kernel, tglx, svens, linux-api, linux-arch On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote: > I've reduced the test (see attachement) and now have only one process > with three threads. This tests fails on my setup as well: main: start 3 threads. #0: started: fct=1 #1: started: fct=1 #2: started: fct=1 #2: mutex_timedlock failed with 22 (round=28772) But, after this patch: https://lore.kernel.org/all/20240116130810.ji1YCxpg@linutronix.de/ ...the attached test hangs. However, the glibc test that was failing for me "nptl/tst-robustpi8" passes with the linked patch applied. So I think that patch fixes the issue I hit. What is passing supposed to look like on the attached test? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2024-01-29 22:23 ` Edgecombe, Rick P @ 2024-01-30 10:12 ` Stefan Liebler 0 siblings, 0 replies; 14+ messages in thread From: Stefan Liebler @ 2024-01-30 10:12 UTC (permalink / raw) To: Edgecombe, Rick P, peterz Cc: xry111, hca, andrealmeid, fweimer, linux-mm, libc-alpha, linux-kernel, tglx, svens, linux-api, linux-arch On 29.01.24 23:23, Edgecombe, Rick P wrote: > On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote: >> I've reduced the test (see attachement) and now have only one process >> with three threads. > > This tests fails on my setup as well: > main: start 3 threads. > #0: started: fct=1 > #1: started: fct=1 > #2: started: fct=1 > #2: mutex_timedlock failed with 22 (round=28772) > > But, after this patch: > https://lore.kernel.org/all/20240116130810.ji1YCxpg@linutronix.de/ > > ...the attached test hangs. > > However, the glibc test that was failing for me "nptl/tst-robustpi8" > passes with the linked patch applied. So I think that patch fixes the > issue I hit. > > What is passing supposed to look like on the attached test? kernel commit "futex: Prevent the reuse of stale pi_state" https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=e626cb02ee8399fd42c415e542d031d185783903 fixes the issue on s390x. With this commit, the test runs to the end: main: start 3 threads. #0: started: fct=1 #1: started: fct=1 #2: started: fct=1 #2: REACHED round 100000000. => exit #0: REACHED round 100000000. => exit #1: REACHED round 100000000. => exit main: end. If you want you can reduce the number of rounds by compiling with -DROUNDS=XYZ or manually adjusting the ROUNDS macro define. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-01-30 10:14 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-13 18:33 Several tst-robust* tests time out with recent Linux kernel Xi Ruoyao 2023-11-14 9:46 ` Xi Ruoyao 2023-11-14 15:31 ` Peter Zijlstra 2023-11-14 15:40 ` Peter Zijlstra 2023-11-14 16:43 ` Florian Weimer 2023-11-14 20:14 ` Peter Zijlstra 2023-11-15 1:11 ` Edgecombe, Rick P 2023-11-15 8:51 ` Peter Zijlstra 2023-11-15 23:28 ` Edgecombe, Rick P 2023-11-17 1:22 ` Edgecombe, Rick P 2024-01-19 13:56 ` Stefan Liebler 2024-01-22 14:34 ` Stefan Liebler 2024-01-29 22:23 ` Edgecombe, Rick P 2024-01-30 10:12 ` Stefan Liebler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).