* Fix nanosleep returning negative rem @ 2021-07-20 15:16 David Allsopp 2021-07-21 8:39 ` Corinna Vinschen 0 siblings, 1 reply; 8+ messages in thread From: David Allsopp @ 2021-07-20 15:16 UTC (permalink / raw) To: cygwin-patches [-- Attachment #1: Type: text/plain, Size: 1021 bytes --] I've pushed a repro case for this to https://github.com/dra27/cygwin-nanosleep-bug.git Originally noticed as the main CI system for OCaml has been failing sporadically for the signal.ml test mentioned in that repo. This morning I tried hammering that test on my dev machine and discovered that it fails very frequently. No idea if that's drivers, Windows 10 updates, number of cores or what, but it was definitely happening, and easily. Drilling further, it appears that NtQueryTimer is able to return a negative value in the TimeRemaining field even when SignalState is false. The values I've seen have always been < 15ms - i.e. less than the timer resolution, so I wonder if there is a point at which the timer has elapsed but has not been signalled, but WaitForMultipleObjects returns because of the EINTR signal. Mildly surprising that it seems to be so reproducible. Anyway, a patch is attached which simply guards a negative return value. The test on tbi.SignalState is in theory unnecessary. All best, David [-- Attachment #2: 0001-Ensure-nanosleep-2-never-returns-negative-rem.patch --] [-- Type: application/octet-stream, Size: 1911 bytes --] From 24f5adc3ac8246d93582ac6e4b2779b369f8b3f1 Mon Sep 17 00:00:00 2001 From: David Allsopp <david.allsopp@metastack.com> Date: Tue, 20 Jul 2021 16:07:00 +0100 Subject: [PATCH] Ensure nanosleep(2) never returns negative rem It appears to be the case that NtQueryTimer can return a negative time remaining for an unsignalled timer. The value appears to be less than the timer resolution. Signed-off-by: David Allsopp <david.allsopp@metastack.com> --- winsup/cygwin/cygwait.cc | 6 ++++-- winsup/cygwin/release/3.2.1 | 4 ++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/winsup/cygwin/cygwait.cc b/winsup/cygwin/cygwait.cc index 1d6c7c9cc..dbbe1db6e 100644 --- a/winsup/cygwin/cygwait.cc +++ b/winsup/cygwin/cygwait.cc @@ -104,8 +104,10 @@ cygwait (HANDLE object, PLARGE_INTEGER timeout, unsigned mask) sizeof tbi, NULL); /* if timer expired, TimeRemaining is negative and represents the system uptime when signalled */ - if (timeout->QuadPart < 0LL) - timeout->QuadPart = tbi.SignalState ? 0LL : tbi.TimeRemaining.QuadPart; + if (timeout->QuadPart < 0LL) { + timeout->QuadPart = tbi.SignalState || tbi.TimeRemaining.QuadPart < 0LL + ? 0LL : tbi.TimeRemaining.QuadPart; + } NtCancelTimer (_my_tls.locals.cw_timer, NULL); } diff --git a/winsup/cygwin/release/3.2.1 b/winsup/cygwin/release/3.2.1 index 4f4db622a..2a339718c 100644 --- a/winsup/cygwin/release/3.2.1 +++ b/winsup/cygwin/release/3.2.1 @@ -44,3 +44,7 @@ Bug Fixes AF_UNSPEC. As specified by POSIX and Linux, this is allowed on datagram sockets, and its effect is to reset the socket's peer address. + +- Fix nanosleep(2) returning negative rem. NtQueryTimer appears to be able to + return a negative remaining time (less than the timer resolution) for + unsignalled timers. -- 2.29.2.windows.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix nanosleep returning negative rem 2021-07-20 15:16 Fix nanosleep returning negative rem David Allsopp @ 2021-07-21 8:39 ` Corinna Vinschen 2021-07-21 9:07 ` David Allsopp 0 siblings, 1 reply; 8+ messages in thread From: Corinna Vinschen @ 2021-07-21 8:39 UTC (permalink / raw) To: cygwin-patches Hi David, On Jul 20 16:16, David Allsopp wrote: > I've pushed a repro case for this to > https://github.com/dra27/cygwin-nanosleep-bug.git > > Originally noticed as the main CI system for OCaml has been failing > sporadically for the signal.ml test mentioned in that repo. This morning I > tried hammering that test on my dev machine and discovered that it fails > very frequently. No idea if that's drivers, Windows 10 updates, number of > cores or what, but it was definitely happening, and easily. > > Drilling further, it appears that NtQueryTimer is able to return a negative > value in the TimeRemaining field even when SignalState is false. The values > I've seen have always been < 15ms - i.e. less than the timer resolution, so > I wonder if there is a point at which the timer has elapsed but has not been > signalled, but WaitForMultipleObjects returns because of the EINTR signal. > Mildly surprising that it seems to be so reproducible. > > Anyway, a patch is attached which simply guards a negative return value. The > test on tbi.SignalState is in theory unnecessary. Thanks for the patch, I think your patch is fine. However, I'd like to dig a bit into this to see what exactly happens. Do you have a very simple testcase in plain C, by any chance? Thanks, Corinna ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Fix nanosleep returning negative rem 2021-07-21 8:39 ` Corinna Vinschen @ 2021-07-21 9:07 ` David Allsopp 2021-07-21 9:30 ` Corinna Vinschen 0 siblings, 1 reply; 8+ messages in thread From: David Allsopp @ 2021-07-21 9:07 UTC (permalink / raw) To: cygwin-patches > On Jul 20 16:16, David Allsopp wrote: > > I've pushed a repro case for this to > > https://github.com/dra27/cygwin-nanosleep-bug.git > > > > Originally noticed as the main CI system for OCaml has been failing > > sporadically for the signal.ml test mentioned in that repo. This > > morning I tried hammering that test on my dev machine and discovered > > that it fails very frequently. No idea if that's drivers, Windows 10 > > updates, number of cores or what, but it was definitely happening, and > > easily. > > > > Drilling further, it appears that NtQueryTimer is able to return a > > negative value in the TimeRemaining field even when SignalState is > > false. The values I've seen have always been < 15ms - i.e. less than > > the timer resolution, so I wonder if there is a point at which the > > timer has elapsed but has not been signalled, but WaitForMultipleObjects > returns because of the EINTR signal. > > Mildly surprising that it seems to be so reproducible. > > > > Anyway, a patch is attached which simply guards a negative return > > value. The test on tbi.SignalState is in theory unnecessary. > > Thanks for the patch, I think your patch is fine. However, I'd like to > dig a bit into this to see what exactly happens. Do you have a very > simple testcase in plain C, by any chance? https://github.com/dra27/cygwin-nanosleep-bug/blob/main/signal.c was as simple as I'd gone at this stage (eliminating OCaml from the equation!). It might be possible to get it to happen without all the pthreads stuff: having confirmed it definitely wasn't OCaml and been able to put the appropriate system_printf's into cygwait to see that NtQueryTimer really was returning this small negative value, I stopped simplifying. Does that repro case trigger on your system too? Best, D ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix nanosleep returning negative rem 2021-07-21 9:07 ` David Allsopp @ 2021-07-21 9:30 ` Corinna Vinschen 2021-07-21 9:33 ` Corinna Vinschen 0 siblings, 1 reply; 8+ messages in thread From: Corinna Vinschen @ 2021-07-21 9:30 UTC (permalink / raw) To: cygwin-patches On Jul 21 09:07, David Allsopp wrote: > > On Jul 20 16:16, David Allsopp wrote: > > > I've pushed a repro case for this to > > > https://github.com/dra27/cygwin-nanosleep-bug.git > > > > > > Originally noticed as the main CI system for OCaml has been failing > > > sporadically for the signal.ml test mentioned in that repo. This > > > morning I tried hammering that test on my dev machine and discovered > > > that it fails very frequently. No idea if that's drivers, Windows 10 > > > updates, number of cores or what, but it was definitely happening, and > > > easily. > > > > > > Drilling further, it appears that NtQueryTimer is able to return a > > > negative value in the TimeRemaining field even when SignalState is > > > false. The values I've seen have always been < 15ms - i.e. less than > > > the timer resolution, so I wonder if there is a point at which the > > > timer has elapsed but has not been signalled, but WaitForMultipleObjects > > returns because of the EINTR signal. > > > Mildly surprising that it seems to be so reproducible. > > > > > > Anyway, a patch is attached which simply guards a negative return > > > value. The test on tbi.SignalState is in theory unnecessary. > > > > Thanks for the patch, I think your patch is fine. However, I'd like to > > dig a bit into this to see what exactly happens. Do you have a very > > simple testcase in plain C, by any chance? > > https://github.com/dra27/cygwin-nanosleep-bug/blob/main/signal.c was > as simple as I'd gone at this stage (eliminating OCaml from the > equation!). It might be possible to get it to happen without all the > pthreads stuff: having confirmed it definitely wasn't OCaml and been > able to put the appropriate system_printf's into cygwait to see that > NtQueryTimer really was returning this small negative value, I stopped > simplifying. > > Does that repro case trigger on your system too? I'm not sure. Would the output " - nanosleep failed: ..." indicate the bug has been triggered? If so, no, I can't reproduce this on my system. I wrote a quick STC using the NT API calls and I can't reproduce the problem with this code either. The output is either SignalState: 1 TimeRemaining: -5354077459183 or SignalState: 0 TimeRemaining: 653 I never get a small negative value in the latter case. Can you reproduce your problem with this testcase or tweak it to reproduce it? Either way, your patch as safe guard should be ok. Thanks, Corinna ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix nanosleep returning negative rem 2021-07-21 9:30 ` Corinna Vinschen @ 2021-07-21 9:33 ` Corinna Vinschen 2021-07-21 16:02 ` David Allsopp 0 siblings, 1 reply; 8+ messages in thread From: Corinna Vinschen @ 2021-07-21 9:33 UTC (permalink / raw) To: cygwin-patches [-- Attachment #1: Type: text/plain, Size: 457 bytes --] On Jul 21 11:30, Corinna Vinschen wrote: > I wrote a quick STC using the NT API calls and I can't reproduce the > problem with this code either. The output is either > > SignalState: 1 TimeRemaining: -5354077459183 > > or > > SignalState: 0 TimeRemaining: 653 > > I never get a small negative value in the latter case. Can you > reproduce your problem with this testcase or tweak it to reproduce it? Now I actually attached the code :} Corinna [-- Attachment #2: timer.c --] [-- Type: text/plain, Size: 1580 bytes --] #include <stdio.h> #include <w32api/windows.h> #include <w32api/winternl.h> #include <w32api/ntdef.h> typedef enum _TIMER_INFORMATION_CLASS { TimerBasicInformation = 0 } TIMER_INFORMATION_CLASS, *PTIMER_INFORMATION_CLASS; typedef struct _TIMER_BASIC_INFORMATION { LARGE_INTEGER TimeRemaining; BOOLEAN SignalState; } TIMER_BASIC_INFORMATION, *PTIMER_BASIC_INFORMATION; NTSTATUS NTAPI NtCreateTimer(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, TIMER_TYPE); NTSTATUS NTAPI NtCancelTimer(HANDLE, PBOOLEAN); NTSTATUS NTAPI NtSetTimer(HANDLE, PLARGE_INTEGER, PVOID, PVOID, BOOLEAN, LONG, PBOOLEAN); NTSTATUS NTAPI NtQueryTimer (HANDLE, TIMER_INFORMATION_CLASS, PVOID, ULONG, PULONG); int main () { HANDLE event; HANDLE timer; NTSTATUS status; LARGE_INTEGER timeout; TIMER_BASIC_INFORMATION tbi; event = CreateEvent (NULL, TRUE, FALSE, NULL); status = NtCreateTimer (&timer, TIMER_ALL_ACCESS, NULL, NotificationTimer); if (!NT_SUCCESS (status)) { fprintf (stderr, "NtCreateTimer: 0x%08lx\n", status); return 1; } timeout.QuadPart = -10000000LL; /* 1 sec */ status = NtSetTimer (timer, &timeout, NULL, NULL, FALSE, 0, NULL); if (!NT_SUCCESS (status)) { fprintf (stderr, "NtSetTimer: 0x%08lx\n", status); return 1; } WaitForSingleObject (event, 985L); NtQueryTimer (timer, TimerBasicInformation, &tbi, sizeof tbi, NULL); printf ("SignalState: %d TimeRemaining: %lld\n", tbi.SignalState, tbi.TimeRemaining.QuadPart); NtCancelTimer (timer, NULL); NtClose (timer); return 0; } ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Fix nanosleep returning negative rem 2021-07-21 9:33 ` Corinna Vinschen @ 2021-07-21 16:02 ` David Allsopp 2021-07-21 17:57 ` Lavrentiev, Anton (NIH/NLM/NCBI) [C] 2021-07-22 8:01 ` Corinna Vinschen 0 siblings, 2 replies; 8+ messages in thread From: David Allsopp @ 2021-07-21 16:02 UTC (permalink / raw) To: cygwin-patches Corinna Vinschen wrote: > Sent: 21 July 2021 10:33 > To: cygwin-patches@cygwin.com > Subject: Re: Fix nanosleep returning negative rem > > On Jul 21 11:30, Corinna Vinschen wrote: > > I wrote a quick STC using the NT API calls and I can't reproduce the > > problem with this code either. The output is either > > > > SignalState: 1 TimeRemaining: -5354077459183 > > > > or > > > > SignalState: 0 TimeRemaining: 653 > > > > I never get a small negative value in the latter case. Can you > > reproduce your problem with this testcase or tweak it to reproduce it? > > Now I actually attached the code :} :) Yes, I can reproduce - I didn't even need a loop! Third time: dra@Thor /cygdrive/c/Scratch/nanosleep $ ./timer SignalState: 0 TimeRemaining: -1151 That said, I can get it easily get this on my desktop (AMD Ryzen Threadripper 3990X) but not at all on my laptop (Intel Core i7-8650U). On the laptop, ignoring the couple of signalled cases, 747 runs of timer.c give values between 131597-149947 with a very narrow SD (~4000) whereas on the AMD chip, 738 runs gives a range of -2722 to 149896 with a relatively wider SD of ~23000. The CI system where this was first seen is an virtualised Intel system so it doesn't appear to be as simple as CPU manufacturer or even core count. That said, I'm not at all familiar with the details of how this works, but I expect the timer for these things is part of the chipset, not the CPU?! Best, D ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Fix nanosleep returning negative rem 2021-07-21 16:02 ` David Allsopp @ 2021-07-21 17:57 ` Lavrentiev, Anton (NIH/NLM/NCBI) [C] 2021-07-22 8:01 ` Corinna Vinschen 1 sibling, 0 replies; 8+ messages in thread From: Lavrentiev, Anton (NIH/NLM/NCBI) [C] @ 2021-07-21 17:57 UTC (permalink / raw) To: David Allsopp, cygwin-patches > I can get it easily get this on my desktop (AMD Ryzen Threadripper 3990X) but not at all on my laptop (Intel Core i7-8650U) Not sure if that's really related but: Have you checked what is your default Windows timer resolution? Some applications change it from the default 100HZ to 1000HZ (by e.g. using timeBeginPeriod(1) -- MS Edge does this, and Chrome, I think, but not Firefox -- yet the change is actually _global_ as documented). The following code shows the timer resolution: #include <windows.h> #include <timeapi.h> #include <stdio.h> #include <synchapi.h> int main() { unsigned int prev = timeGetTime(), next; Sleep(1); next = timeGetTime(); printf("Sleep(1) = %u\n", next - prev); return 0; } Output: Sleep(1) = 10 means the default 100HZ Sleep(1) = 1 means the increased resolution of 1ms Anton Lavrentiev Contractor NIH/NLM/NCBI P.S. timeBeginPeriod(): This function affects a global Windows setting. Windows uses the lowest value (that is, highest resolution) requested by any process. Setting a higher resolution can improve the accuracy of time-out intervals in wait functions. However, it can also reduce overall system performance, because the thread scheduler switches tasks more often. High resolutions can also prevent the CPU power management system from entering power-saving modes. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fix nanosleep returning negative rem 2021-07-21 16:02 ` David Allsopp 2021-07-21 17:57 ` Lavrentiev, Anton (NIH/NLM/NCBI) [C] @ 2021-07-22 8:01 ` Corinna Vinschen 1 sibling, 0 replies; 8+ messages in thread From: Corinna Vinschen @ 2021-07-22 8:01 UTC (permalink / raw) To: cygwin-patches On Jul 21 16:02, David Allsopp wrote: > Corinna Vinschen wrote: > > Sent: 21 July 2021 10:33 > > To: cygwin-patches@cygwin.com > > Subject: Re: Fix nanosleep returning negative rem > > > > On Jul 21 11:30, Corinna Vinschen wrote: > > > I wrote a quick STC using the NT API calls and I can't reproduce the > > > problem with this code either. The output is either > > > > > > SignalState: 1 TimeRemaining: -5354077459183 > > > > > > or > > > > > > SignalState: 0 TimeRemaining: 653 > > > > > > I never get a small negative value in the latter case. Can you > > > reproduce your problem with this testcase or tweak it to reproduce it? > > > > Now I actually attached the code :} > > :) Yes, I can reproduce - I didn't even need a loop! Third time: > > dra@Thor /cygdrive/c/Scratch/nanosleep > $ ./timer > SignalState: 0 TimeRemaining: -1151 > > That said, I can get it easily get this on my desktop (AMD Ryzen > Threadripper 3990X) but not at all on my laptop (Intel Core i7-8650U). > On the laptop, ignoring the couple of signalled cases, 747 runs of > timer.c give values between 131597-149947 with a very narrow SD > (~4000) whereas on the AMD chip, 738 runs gives a range of -2722 to > 149896 with a relatively wider SD of ~23000. > > The CI system where this was first seen is an virtualised Intel system > so it doesn't appear to be as simple as CPU manufacturer or even core > count. Weird. I pushed your patch. > That said, I'm not at all familiar with the details of how this > works, but I expect the timer for these things is part of the chipset, > not the CPU?! As timers go, the timer behind the NtCreateTimer scenes is a simple interrupt based timer with a default resolution of 15 ms. Thanks, Corinna ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-07-22 8:01 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-20 15:16 Fix nanosleep returning negative rem David Allsopp 2021-07-21 8:39 ` Corinna Vinschen 2021-07-21 9:07 ` David Allsopp 2021-07-21 9:30 ` Corinna Vinschen 2021-07-21 9:33 ` Corinna Vinschen 2021-07-21 16:02 ` David Allsopp 2021-07-21 17:57 ` Lavrentiev, Anton (NIH/NLM/NCBI) [C] 2021-07-22 8:01 ` Corinna Vinschen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).