public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
* [PATCH 00/11] More testsuite fixes
@ 2023-07-13 11:38 Jon Turney
  2023-07-13 11:38 ` [PATCH 01/11] Cygwin: testsuite: Setup test prereqs in 'installation' the tests run in Jon Turney
                   ` (12 more replies)
  0 siblings, 13 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

This gets us from :

FAIL: cygload
FAIL: devdsp.c
FAIL: ltp/access05.c
FAIL: ltp/fcntl07.c
FAIL: ltp/symlink01.c
FAIL: ltp/symlink03.c
FAIL: ltp/umask03.c
FAIL: pthread/cancel11.c
FAIL: pthread/cancel3.c
FAIL: pthread/cancel5.c
FAIL: pthread/inherit1.c
FAIL: pthread/priority1.c
FAIL: pthread/priority2.c
FAIL: systemcall.c

to:

FAIL: cygload
FAIL: devdsp.c
FAIL: ltp/umask03.c
FAIL: pthread/cancel11.c
FAIL: pthread/priority1.c

Notes on the remaining failures:

cygload: I have no idea of the voodoo needed to make this work

devdsp: forkplaytest() is broken in some complex way I don't understand, causing a hang. It looks like the child 
process doesn't exit, which must be a bug in Cygwin?

devdsp: it's totally unclear to me if opening /dev/dsp twice is meant to be allowed or not

umask03: was utterly broken, now fails, but accurately reports the failure. I don't quite understand the 
permissions issue which is causing it to fail.

cancel11: some funkiness I can't work out, causing the save/restoring signal handlers around system() to not 
work correctly

priority1: this looks like a bug with pthread_create() and a non-null pthread_attr_t *, for which I'll send a 
patch shortly.

Jon Turney (11):
  Cygwin: testsuite: Setup test prereqs in 'installation' the tests run
    in
  Cygwin: testsuite: Add a simple timeout mechanism
  Cygwin: testsuite: Remove const from writable string in fcntl07b
  Cygwin: testsuite: Skip devdsp test when no audio devices present
  Cygwin: testsuite: Just log result of second open of /dev/dsp
  Cygwin: testsuite: Also check direct call in systemcall
  Cygwin: testsuite: Fix for limited thread priority values
  Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  Cygwin: testsuite: Fix a buffer overflow in symlink01
  Cygwin: testsuite: Minor fixes to umask03
  Cygwin: testsuite: Drop Adminstrator privileges while running tests

 .github/workflows/cygwin.yml                  |  4 ++-
 winsup/cygwin/Makefile.am                     |  4 +--
 winsup/doc/faq-programming.xml                |  4 ++-
 winsup/testsuite/Makefile.am                  | 28 ++++++++++++++-
 winsup/testsuite/cygrun.c                     | 17 +++++++--
 winsup/testsuite/winsup.api/devdsp.c          | 35 +++++++++++--------
 winsup/testsuite/winsup.api/ltp/fcntl07B.c    |  2 +-
 winsup/testsuite/winsup.api/ltp/symlink01.c   |  2 +-
 winsup/testsuite/winsup.api/ltp/umask03.c     | 21 ++++++-----
 winsup/testsuite/winsup.api/pthread/cancel3.c | 24 +++++++++----
 winsup/testsuite/winsup.api/pthread/cancel5.c | 24 +++++++++----
 .../testsuite/winsup.api/pthread/inherit1.c   | 21 ++++++++++-
 .../testsuite/winsup.api/pthread/priority1.c  | 24 +++++++++++--
 .../testsuite/winsup.api/pthread/priority2.c  | 22 ++++++++++--
 winsup/testsuite/winsup.api/systemcall.c      | 10 +++++-
 winsup/testsuite/winsup.api/winsup.exp        |  2 +-
 16 files changed, 194 insertions(+), 50 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/11] Cygwin: testsuite: Setup test prereqs in 'installation' the tests run in
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
@ 2023-07-13 11:38 ` Jon Turney
  2023-07-13 11:38 ` [PATCH 02/11] Cygwin: testsuite: Add a simple timeout mechanism Jon Turney
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

Do some setup in the Cygwin 'installation' at testsuite/testinst/:

* Ensure /tmp exists

* Use BusyBox to provide executables needed by tests which use system()
(sh, sleep, ls)

This enables tests which use system(), or require /tmp to exist to pass.
---
 .github/workflows/cygwin.yml   |  3 ++-
 winsup/cygwin/Makefile.am      |  4 ++--
 winsup/doc/faq-programming.xml |  3 ++-
 winsup/testsuite/Makefile.am   | 25 ++++++++++++++++++++++++-
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
index 575ff1fdc..248a3e4cd 100644
--- a/.github/workflows/cygwin.yml
+++ b/.github/workflows/cygwin.yml
@@ -71,6 +71,7 @@ jobs:
         packages: >-
           autoconf,
           automake,
+          busybox,
           cocom,
           dblatex,
           dejagnu,
@@ -116,6 +117,6 @@ jobs:
         export PATH=/usr/bin:$(cygpath ${SYSTEMROOT})/system32 &&
         export MAKEFLAGS=-j$(nproc) &&
         cd build &&
-        (export PATH=${{ matrix.target }}/winsup/testsuite/runtime:${PATH} && cmd /c $(cygpath -wa ${{ matrix.target }}/winsup/cygserver/cygserver) &) &&
+        (export PATH=${{ matrix.target }}/winsup/testsuite/testinst/bin:${PATH} && cmd /c $(cygpath -wa ${{ matrix.target }}/winsup/cygserver/cygserver) &) &&
         (cd ${{ matrix.target }}/winsup; make check || true)
       shell: C:\cygwin\bin\bash.exe --noprofile --norc -eo pipefail '{0}'
diff --git a/winsup/cygwin/Makefile.am b/winsup/cygwin/Makefile.am
index c34ca6ddc..bfb5ead10 100644
--- a/winsup/cygwin/Makefile.am
+++ b/winsup/cygwin/Makefile.am
@@ -602,8 +602,8 @@ $(NEW_DLL_NAME): $(LDSCRIPT) libdll.a $(VERSION_OFILES) $(LIBSERVER)\
 	$(newlib_build)/libm.a \
 	$(newlib_build)/libc.a \
 	-lgcc -lkernel32 -lntdll -Wl,-Map,cygwin.map
-	@$(MKDIR_P) ${target_builddir}/winsup/testsuite/runtime/
-	$(AM_V_at)$(INSTALL_PROGRAM) $(NEW_DLL_NAME) ${target_builddir}/winsup/testsuite/runtime/$(DLL_NAME)
+	@$(MKDIR_P) ${target_builddir}/winsup/testsuite/testinst/bin/
+	$(AM_V_at)$(INSTALL_PROGRAM) $(NEW_DLL_NAME) ${target_builddir}/winsup/testsuite/testinst/bin/$(DLL_NAME)
 
 # cygwin import library
 toolopts=--cpu=@target_cpu@ --ar=@AR@ --as=@AS@ --nm=@NM@ --objcopy=@OBJCOPY@
diff --git a/winsup/doc/faq-programming.xml b/winsup/doc/faq-programming.xml
index 7fc6baf25..15ae6eac4 100644
--- a/winsup/doc/faq-programming.xml
+++ b/winsup/doc/faq-programming.xml
@@ -697,7 +697,8 @@ Building these programs can be disabled with the <literal>--without-cross-bootst
 option to <literal>configure</literal>.
 </para>
 
-<!-- If you want to run the tests, <literal>dejagnu</literal> is also required. -->
+<!-- If you want to run the tests, <literal>dejagnu</literal> and
+     <literal>busybox</literal> are also required. -->
 
 <para>
 Building the documentation also requires the <literal>dblatex</literal>,
diff --git a/winsup/testsuite/Makefile.am b/winsup/testsuite/Makefile.am
index 7853d98e8..11332eda2 100644
--- a/winsup/testsuite/Makefile.am
+++ b/winsup/testsuite/Makefile.am
@@ -339,7 +339,7 @@ testdll_tmpdir = $(shell cygpath -ma $(tmpdir) | sed -e 's#^\([A-Z]\):#/cygdrive
 
 site-extra.exp: ../config.status Makefile
 	@rm -f ./tmp0
-	@echo "set runtime_root \"`pwd`/runtime\"" >> ./tmp0
+	@echo "set runtime_root \"`pwd`/testinst/bin\"" >> ./tmp0
 	@echo "set tmpdir $(tmpdir)" >> ./tmp0
 	@echo "set testdll_tmpdir $(testdll_tmpdir)" >> ./tmp0
 	@echo "set cygrun \"`pwd`/mingw/cygrun\"" >> ./tmp0
@@ -347,6 +347,29 @@ site-extra.exp: ../config.status Makefile
 
 EXTRA_DEJAGNU_SITE_CONFIG = site-extra.exp
 
+# Set up things in the Cygwin 'installation' at testsuite/testinst/ to provide
+# things which tests need to work
+#
+# * Create /tmp
+# * Ensure there is a /usr/bin/sh for tests which use system()
+# * Ensure there is a /usr/bin/sleep for tests which use system("sleep 10")
+# * Ensure there is a /usr/bin/ls for tests which  use system("ls")
+#
+# copy to avoid all the complexities: hardlink will fail if builddir is on a
+# separate filesystem, symlink would need to be constructed with regard to the
+# mounts of the test installation, and making it into /bin/ will cause
+# CreateProcess() to load cygwin1.dll from there.
+#
+# use busybox executables as they don't have any other shared library
+# dependencies other than cygwin1.dll.
+#
+
+check-local:
+	$(MKDIR_P) ${builddir}/testinst/tmp
+	cd ${builddir}/testinst/bin && cp /usr/libexec/busybox/bin/busybox.exe sh.exe
+	cd ${builddir}/testinst/bin && cp /usr/libexec/busybox/bin/busybox.exe sleep.exe
+	cd ${builddir}/testinst/bin && cp /usr/libexec/busybox/bin/busybox.exe ls.exe
+
 # target to build all the programs needed by check, without running check
 check_programs: $(check_PROGRAMS)
 
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 02/11] Cygwin: testsuite: Add a simple timeout mechanism
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
  2023-07-13 11:38 ` [PATCH 01/11] Cygwin: testsuite: Setup test prereqs in 'installation' the tests run in Jon Turney
@ 2023-07-13 11:38 ` Jon Turney
  2023-07-13 11:38 ` [PATCH 03/11] Cygwin: testsuite: Remove const from writable string in fcntl07b Jon Turney
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

Astonishingly, we don't have this already, so tests which hang just stop
the testsuite dead in it's tracks...
---
 winsup/testsuite/cygrun.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/winsup/testsuite/cygrun.c b/winsup/testsuite/cygrun.c
index e6c4aa705..925b5513f 100644
--- a/winsup/testsuite/cygrun.c
+++ b/winsup/testsuite/cygrun.c
@@ -20,6 +20,7 @@ main (int argc, char **argv)
 {
   STARTUPINFO sa;
   PROCESS_INFORMATION pi;
+  DWORD res;
   DWORD ec = 1;
   char *p;
 
@@ -42,9 +43,21 @@ main (int argc, char **argv)
       exit (1);
     }
 
-  WaitForSingleObject (pi.hProcess, INFINITE);
+  res = WaitForSingleObject (pi.hProcess, 60 * 1000);
 
-  GetExitCodeProcess (pi.hProcess, &ec);
+  if (res == WAIT_TIMEOUT)
+    {
+      char cmd[1024];
+      // there is no simple API to kill a Windows process tree
+      sprintf(cmd, "taskkill /f /t /pid %lu", GetProcessId(pi.hProcess));
+      system(cmd);
+      fprintf (stderr, "Timeout\n");
+      ec = 124;
+    }
+  else
+    {
+      GetExitCodeProcess (pi.hProcess, &ec);
+    }
 
   CloseHandle (pi.hProcess);
   CloseHandle (pi.hThread);
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 03/11] Cygwin: testsuite: Remove const from writable string in fcntl07b
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
  2023-07-13 11:38 ` [PATCH 01/11] Cygwin: testsuite: Setup test prereqs in 'installation' the tests run in Jon Turney
  2023-07-13 11:38 ` [PATCH 02/11] Cygwin: testsuite: Add a simple timeout mechanism Jon Turney
@ 2023-07-13 11:38 ` Jon Turney
  2023-07-13 11:38 ` [PATCH 04/11] Cygwin: testsuite: Skip devdsp test when no audio devices present Jon Turney
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

---
 winsup/testsuite/winsup.api/ltp/fcntl07B.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/winsup/testsuite/winsup.api/ltp/fcntl07B.c b/winsup/testsuite/winsup.api/ltp/fcntl07B.c
index 4e94ff267..db866fddd 100644
--- a/winsup/testsuite/winsup.api/ltp/fcntl07B.c
+++ b/winsup/testsuite/winsup.api/ltp/fcntl07B.c
@@ -170,7 +170,7 @@ const char *File1 = DEFAULT_FILE;
 
 #define DEFAULT_SUBPROG "test_open"
 const char *openck = DEFAULT_SUBPROG;	/* support program name to check for open FD */
-const char subprog_path[_POSIX_PATH_MAX];/* path to exec "openck" with */
+char subprog_path[_POSIX_PATH_MAX];/* path to exec "openck" with */
 #define STRSIZE 255
 #define FIFONAME "FiFo"
 
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 04/11] Cygwin: testsuite: Skip devdsp test when no audio devices present
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (2 preceding siblings ...)
  2023-07-13 11:38 ` [PATCH 03/11] Cygwin: testsuite: Remove const from writable string in fcntl07b Jon Turney
@ 2023-07-13 11:38 ` Jon Turney
  2023-07-13 11:38 ` [PATCH 05/11] Cygwin: testsuite: Just log result of second open of /dev/dsp Jon Turney
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

---
 winsup/testsuite/Makefile.am         |  3 +++
 winsup/testsuite/winsup.api/devdsp.c | 15 +++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/winsup/testsuite/Makefile.am b/winsup/testsuite/Makefile.am
index 11332eda2..60111a0aa 100644
--- a/winsup/testsuite/Makefile.am
+++ b/winsup/testsuite/Makefile.am
@@ -325,6 +325,9 @@ AM_CPPFLAGS = -I$(srcdir)/libltp/include
 AM_LDFLAGS = $(LDFLAGS_FOR_TESTDLL)
 LDADD = $(builddir)/libltp.a $(builddir)/../cygwin/binmode.o $(LDADD_FOR_TESTDLL)
 
+# additional flags for specific test executables
+winsup_api_devdsp_LDADD = -lwinmm $(LDADD)
+
 DEJATOOL = winsup
 
 # Add '-v' to RUNTESTFLAGS if V=1
diff --git a/winsup/testsuite/winsup.api/devdsp.c b/winsup/testsuite/winsup.api/devdsp.c
index 6c8850a74..0ac76f085 100644
--- a/winsup/testsuite/winsup.api/devdsp.c
+++ b/winsup/testsuite/winsup.api/devdsp.c
@@ -27,6 +27,8 @@ details. */
 #include <errno.h>
 #include "test.h" /* use libltp framework */
 
+#include <windows.h>
+
 /* Controls if a child can open the device after the parent */
 #define CHILD_EXPECT 0 /* 0 or 1 */
 
@@ -59,6 +61,7 @@ void playwavtest (void);
 void syncwithchild (pid_t pid, int expected_exit_status);
 void cleanup (void);
 void dup_test (void);
+void devcheck (void);
 
 static int expect_child_failure = 0;
 
@@ -77,6 +80,7 @@ int
 main (int argc, char *argv[])
 {
   /*  tst_brkm(TBROK, cleanup, "see if it breaks all right"); */
+  devcheck ();
   ioctltest ();
   playbacktest ();
   recordingtest ();
@@ -91,6 +95,17 @@ main (int argc, char *argv[])
   return 0;
 }
 
+/* skip test if we don't have any audio devices*/
+void
+devcheck (void)
+{
+  if ((waveInGetNumDevs() == 0) || (waveOutGetNumDevs() == 0))
+    {
+      tst_resm (TINFO, "Skipping, no audio devices present");
+      exit(0);
+    }
+}
+
 /* test some extra ioctls */
 void
 ioctltest (void)
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 05/11] Cygwin: testsuite: Just log result of second open of /dev/dsp
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (3 preceding siblings ...)
  2023-07-13 11:38 ` [PATCH 04/11] Cygwin: testsuite: Skip devdsp test when no audio devices present Jon Turney
@ 2023-07-13 11:38 ` Jon Turney
  2023-07-13 11:38 ` [PATCH 06/11] Cygwin: testsuite: Also check direct call in systemcall Jon Turney
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

Do not rate successful second open of /dev/dsp as an error, just log the
result.

Based on this patch by Gerd Spalink:

https://cygwin.com/pipermail/cygwin-patches/2004q3/004848.html
---
 winsup/testsuite/winsup.api/devdsp.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/winsup/testsuite/winsup.api/devdsp.c b/winsup/testsuite/winsup.api/devdsp.c
index 0ac76f085..de3ccfa19 100644
--- a/winsup/testsuite/winsup.api/devdsp.c
+++ b/winsup/testsuite/winsup.api/devdsp.c
@@ -170,15 +170,11 @@ playbacktest (void)
 		strerror (errno));
     }
   audio2 = open ("/dev/dsp", O_WRONLY);
+  tst_resm (TINFO, "Second open /dev/dsp W %s ",
+	    audio2 >= 0 ? "WORKS" : "DOESN'T WORK");
   if (audio2 >= 0)
     {
-      tst_brkm (TFAIL, cleanup,
-		"Second open /dev/dsp W succeeded, but is expected to fail");
-    }
-  else if (errno != EBUSY)
-    {
-      tst_brkm (TFAIL, cleanup, "Expected EBUSY here, exit: %s",
-		strerror (errno));
+      close (audio2);
     }
   for (rate = 0; rate < sizeof (rates) / sizeof (int); rate++)
     for (k = 0; k < sizeof (sblut) / sizeof (struct sb); k++)
@@ -209,15 +205,11 @@ recordingtest (void)
 		strerror (errno));
     }
   audio2 = open ("/dev/dsp", O_RDONLY);
+  tst_resm (TINFO, "Second open /dev/dsp R %s",
+	    audio2 >= 0 ? "WORKS" : "DOESN'T WORK");
   if (audio2 >= 0)
     {
-      tst_brkm (TFAIL, cleanup,
-		"Second open /dev/dsp R succeeded, but is expected to fail");
-    }
-  else if (errno != EBUSY)
-    {
-      tst_brkm (TFAIL, cleanup, "Expected EBUSY here, exit: %s",
-		strerror (errno));
+      close (audio2);
     }
   for (rate = 0; rate < sizeof (rates) / sizeof (int); rate++)
     for (k = 0; k < sizeof (sblut) / sizeof (struct sb); k++)
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/11] Cygwin: testsuite: Also check direct call in systemcall
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (4 preceding siblings ...)
  2023-07-13 11:38 ` [PATCH 05/11] Cygwin: testsuite: Just log result of second open of /dev/dsp Jon Turney
@ 2023-07-13 11:38 ` Jon Turney
  2023-07-13 11:39 ` [PATCH 07/11] Cygwin: testsuite: Fix for limited thread priority values Jon Turney
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:38 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

Check direct call to system(), as well as one in a subprocess.

(This is a lot easier to debug when it's completely broken by the
environment the test is running in)
---
 winsup/testsuite/winsup.api/systemcall.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/winsup/testsuite/winsup.api/systemcall.c b/winsup/testsuite/winsup.api/systemcall.c
index d10c9825c..74bd6f870 100644
--- a/winsup/testsuite/winsup.api/systemcall.c
+++ b/winsup/testsuite/winsup.api/systemcall.c
@@ -26,6 +26,14 @@ main (int argc, char **argv)
       fprintf (stderr, "couldn't redirect stdout to /dev/null, fd %d - %s\n", fd, strerror (errno));
       exit (1);
     }
+
+  n = system ("ls");
+  if (n != 0)
+    {
+      fprintf (stderr, "system() (in parent) call returned %x\n", n);
+      exit (1);
+    }
+
   if (pipe (fds))
     {
       fprintf (stderr, "pipe call failed - %s\n", strerror (errno));
@@ -61,7 +69,7 @@ main (int argc, char **argv)
     }
   if (n != 0)
     {
-      fprintf (stderr, "system() call returned %x\n", n);
+      fprintf (stderr, "system() (in child) call returned %x\n", n);
       exit (1);
     }
   exit (0);
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 07/11] Cygwin: testsuite: Fix for limited thread priority values
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (5 preceding siblings ...)
  2023-07-13 11:38 ` [PATCH 06/11] Cygwin: testsuite: Also check direct call in systemcall Jon Turney
@ 2023-07-13 11:39 ` Jon Turney
  2023-07-13 11:39 ` [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5 Jon Turney
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:39 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

Since commit 4b51e4c1, we return the actual thread priority, not what we
originally stored in the thread attributes.

Windows only supports 7 thread priority levels, which we map onto the 32
required by POSIX.  So, only a subset of values will be returned exactly
by by pthread_getschedparam() after pthread_setschedparam().

Adjust tests priority1, priority2 and inherit1 so they only check for
round-tripping priority values which can be exactly represented.

For CI, this needs to handle process priority class "below normal
priority" as well.

Also check that the ranmge of priority values is at least 32, as
required by POSIX.
---
 .../testsuite/winsup.api/pthread/inherit1.c   | 21 +++++++++++++++-
 .../testsuite/winsup.api/pthread/priority1.c  | 24 +++++++++++++++++--
 .../testsuite/winsup.api/pthread/priority2.c  | 22 +++++++++++++++--
 3 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/winsup/testsuite/winsup.api/pthread/inherit1.c b/winsup/testsuite/winsup.api/pthread/inherit1.c
index 16c3f534b..f036462aa 100644
--- a/winsup/testsuite/winsup.api/pthread/inherit1.c
+++ b/winsup/testsuite/winsup.api/pthread/inherit1.c
@@ -50,6 +50,23 @@ void * func(void * arg)
   return (void *) (size_t)param.sched_priority;
 }
 
+// Windows only supports 7 thread priority levels, which we map onto the 32
+// required by POSIX.  The exact mapping also depends on the overall process
+// priority class. So only a subset of values will be returned exactly by
+// pthread_getschedparam() after pthread_setschedparam().
+int doable_pri(int pri)
+{
+  switch (GetPriorityClass(GetCurrentProcess()))
+    {
+    case BELOW_NORMAL_PRIORITY_CLASS:
+      return (pri == 2) || (pri ==  8) || (pri == 10) || (pri == 12) || (pri == 14) || (pri == 16) || (pri == 30);
+    case NORMAL_PRIORITY_CLASS:
+      return (pri == 2) || (pri == 12) || (pri == 14) || (pri == 16) || (pri == 18) || (pri == 20) || (pri == 30);
+    }
+
+  return TRUE;
+}
+
 int
 main()
 {
@@ -81,7 +98,9 @@ main()
       assert(pthread_setschedparam(mainThread, SCHED_FIFO, &mainParam) == 0);
       assert(pthread_getschedparam(mainThread, &policy, &mainParam) == 0);
       assert(policy == SCHED_FIFO);
-      assert(mainParam.sched_priority == prio);
+
+      if (doable_pri(prio))
+        assert(mainParam.sched_priority == prio);
 
       for (param.sched_priority = prio;
            param.sched_priority <= maxPrio;
diff --git a/winsup/testsuite/winsup.api/pthread/priority1.c b/winsup/testsuite/winsup.api/pthread/priority1.c
index a1e8d051d..135f77d76 100644
--- a/winsup/testsuite/winsup.api/pthread/priority1.c
+++ b/winsup/testsuite/winsup.api/pthread/priority1.c
@@ -50,7 +50,24 @@ void * func(void * arg)
   assert(policy == SCHED_FIFO);
   return (void *)(size_t)param.sched_priority;
 }
- 
+
+// Windows only supports 7 thread priority levels, which we map onto the 32
+// required by POSIX.  The exact mapping also depends on the overall process
+// priority class. So only a subset of values will be returned exactly by
+// pthread_getschedparam() after pthread_setschedparam().
+int doable_pri(int pri)
+{
+  switch (GetPriorityClass(GetCurrentProcess()))
+    {
+    case BELOW_NORMAL_PRIORITY_CLASS:
+      return (pri == 2) || (pri ==  8) || (pri == 10) || (pri == 12) || (pri == 14) || (pri == 16) || (pri == 30);
+    case NORMAL_PRIORITY_CLASS:
+      return (pri == 2) || (pri == 12) || (pri == 14) || (pri == 16) || (pri == 18) || (pri == 20) || (pri == 30);
+    }
+
+  return TRUE;
+}
+
 int
 main()
 {
@@ -61,6 +78,8 @@ main()
   int maxPrio = sched_get_priority_max(SCHED_FIFO);
   int minPrio = sched_get_priority_min(SCHED_FIFO);
 
+  assert((maxPrio - minPrio) >= 31);
+
   assert(pthread_attr_init(&attr) == 0);
   assert(pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED) == 0);
 
@@ -71,7 +90,8 @@ main()
       assert(pthread_attr_setschedparam(&attr, &param) == 0);
       assert(pthread_create(&t, &attr, func, NULL) == 0);
       pthread_join(t, &result);
-      assert((int)(size_t) result == param.sched_priority);
+      if (doable_pri(param.sched_priority))
+	assert((int)(size_t) result == param.sched_priority);
     }
 
   return 0;
diff --git a/winsup/testsuite/winsup.api/pthread/priority2.c b/winsup/testsuite/winsup.api/pthread/priority2.c
index 0534e7ba1..f084efadf 100644
--- a/winsup/testsuite/winsup.api/pthread/priority2.c
+++ b/winsup/testsuite/winsup.api/pthread/priority2.c
@@ -54,7 +54,24 @@ void * func(void * arg)
   assert(policy == SCHED_FIFO);
   return (void *) (size_t)param.sched_priority;
 }
- 
+
+// Windows only supports 7 thread priority levels, which we map onto the 32
+// required by POSIX.  The exact mapping also depends on the overall process
+// priority class. So only a subset of values will be returned exactly by
+// pthread_getschedparam() after pthread_setschedparam().
+int doable_pri(int pri)
+{
+  switch (GetPriorityClass(GetCurrentProcess()))
+    {
+    case BELOW_NORMAL_PRIORITY_CLASS:
+      return (pri == 2) || (pri ==  8) || (pri == 10) || (pri == 12) || (pri == 14) || (pri == 16) || (pri == 30);
+    case NORMAL_PRIORITY_CLASS:
+      return (pri == 2) || (pri == 12) || (pri == 14) || (pri == 16) || (pri == 18) || (pri == 20) || (pri == 30);
+    }
+
+  return TRUE;
+}
+
 int
 main()
 {
@@ -73,7 +90,8 @@ main()
       assert(pthread_setschedparam(t, SCHED_FIFO, &param) == 0);
       assert(pthread_mutex_unlock(&startMx) == 0);
       pthread_join(t, &result);
-      assert((int)(size_t)result == param.sched_priority);
+      if (doable_pri(param.sched_priority))
+	assert((int)(size_t)result == param.sched_priority);
     }
 
   return 0;
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (6 preceding siblings ...)
  2023-07-13 11:39 ` [PATCH 07/11] Cygwin: testsuite: Fix for limited thread priority values Jon Turney
@ 2023-07-13 11:39 ` Jon Turney
  2023-07-13 11:43   ` Jon Turney
  2023-07-13 18:16   ` Corinna Vinschen
  2023-07-13 11:39 ` [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01 Jon Turney
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:39 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

These tests async thread cancellation of a thread that doesn't have any
cancellation points.

Unfortunately, since 450f557f the async cancellation silently fails when
the thread is inside the kernel function Sleep(), so it just exits
normally after 10 seconds. (See the commentary in pthread::cancel() in
thread.cc, where it checks if the target thread is inside the kernel,
and silently converts the cancellation into a deferred one)

Work around this by busy-waiting rather than Sleep()ing for 10 seconds.

This is still somewhat fragile: the async cancel could still fail, if it
happens to occur while we're inside the kernel function that time()
calls.

v2:
Do nothing more efficiently
---
 winsup/testsuite/winsup.api/pthread/cancel3.c | 24 ++++++++++++++-----
 winsup/testsuite/winsup.api/pthread/cancel5.c | 24 ++++++++++++++-----
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/winsup/testsuite/winsup.api/pthread/cancel3.c b/winsup/testsuite/winsup.api/pthread/cancel3.c
index 832fe2e3f..07feb7c9b 100644
--- a/winsup/testsuite/winsup.api/pthread/cancel3.c
+++ b/winsup/testsuite/winsup.api/pthread/cancel3.c
@@ -75,11 +75,22 @@ mythread(void * arg)
   assert(pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL) == 0);
 
   /*
-   * We wait up to 10 seconds, waking every 0.1 seconds,
-   * for a cancelation to be applied to us.
+   * We wait up to 10 seconds for a cancelation to be applied to us.
    */
-  for (bag->count = 0; bag->count < 100; bag->count++)
-    Sleep(100);
+  for (bag->count = 0; bag->count < 10; bag->count++)
+    {
+      /* Busy wait to avoid Sleep(), since we can't asynchronous cancel inside a
+	 kernel function. (This is still somewhat fragile as if the async cancel
+	 can fail if it happens to occur while we're inside the kernel function
+	 that time() calls...)  */
+      time_t start = time(NULL);
+      while ((time(NULL) - start) < 1)
+	{
+	  int i;
+	  for (i = 0; i < 1E7; i++)
+	    __asm__ volatile ("pause":::);
+	}
+    }
 
   return result;
 }
@@ -149,10 +160,11 @@ main()
 
       if (fail)
 	{
-	  fprintf(stderr, "Thread %d: started %d: count %d\n",
+	  fprintf(stderr, "Thread %d: started %d: count %d: result %d \n",
 		  i,
 		  threadbag[i].started,
-		  threadbag[i].count);
+		  threadbag[i].count,
+		  result);
 	}
       failed = (failed || fail);
     }
diff --git a/winsup/testsuite/winsup.api/pthread/cancel5.c b/winsup/testsuite/winsup.api/pthread/cancel5.c
index 8b7240615..999b3c95c 100644
--- a/winsup/testsuite/winsup.api/pthread/cancel5.c
+++ b/winsup/testsuite/winsup.api/pthread/cancel5.c
@@ -76,11 +76,22 @@ mythread(void * arg)
   assert(pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL) == 0);
 
   /*
-   * We wait up to 10 seconds, waking every 0.1 seconds,
-   * for a cancelation to be applied to us.
+   * We wait up to 10 seconds for a cancelation to be applied to us.
    */
-  for (bag->count = 0; bag->count < 100; bag->count++)
-    Sleep(100);
+  for (bag->count = 0; bag->count < 10; bag->count++)
+    {
+      /* Busy wait to avoid Sleep(), since we can't asynchronous cancel inside a
+	 kernel function. (This is still somewhat fragile as if the async cancel
+	 can fail if it happens to occur while we're inside the kernel function
+	 that time() calls...)  */
+      time_t start = time(NULL);
+      while ((time(NULL) - start) < 1)
+	{
+	  int i;
+	  for (i = 0; i < 1E7; i++)
+	    __asm__ volatile ("pause":::);
+	}
+    }
 
   return result;
 }
@@ -148,10 +159,11 @@ main()
 
       if (fail)
 	{
-	  fprintf(stderr, "Thread %d: started %d: count %d\n",
+	  fprintf(stderr, "Thread %d: started %d: count %d: result %d\n",
 		  i,
 		  threadbag[i].started,
-		  threadbag[i].count);
+		  threadbag[i].count,
+		  result);
 	}
       failed = (failed || fail);
     }
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (7 preceding siblings ...)
  2023-07-13 11:39 ` [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5 Jon Turney
@ 2023-07-13 11:39 ` Jon Turney
  2023-07-13 18:17   ` Corinna Vinschen
  2023-07-13 11:39 ` [PATCH 10/11] Cygwin: testsuite: Minor fixes to umask03 Jon Turney
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:39 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

See ltp commit 44d51c3f
---
 winsup/testsuite/winsup.api/ltp/symlink01.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/winsup/testsuite/winsup.api/ltp/symlink01.c b/winsup/testsuite/winsup.api/ltp/symlink01.c
index 54a24b87f..186a85b4e 100644
--- a/winsup/testsuite/winsup.api/ltp/symlink01.c
+++ b/winsup/testsuite/winsup.api/ltp/symlink01.c
@@ -488,7 +488,7 @@ time_t a_time_value = 100;
 const char  *TCID = NULL;
 char  *Selectedtests = NULL;		/* Name (tcid) of selected test cases */
 char test_msg[BUFMAX];
-char full_path[PATH_MAX+1];
+char full_path[PATH_MAX+1+1]; /* Add one for '\0' and another to exceed the PATH_MAX limit, see creat_path_max() */
 extern int Tst_count;
 extern char *TESTDIR;
 extern char *strrchr();
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 10/11] Cygwin: testsuite: Minor fixes to umask03
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (8 preceding siblings ...)
  2023-07-13 11:39 ` [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01 Jon Turney
@ 2023-07-13 11:39 ` Jon Turney
  2023-07-13 18:18   ` Corinna Vinschen
  2023-07-13 11:39 ` [PATCH 11/11] Cygwin: testsuite: Drop Adminstrator privileges while running tests Jon Turney
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:39 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

See ltp commits f32691e7, 923b23ff and b846e7bb
---
 winsup/testsuite/winsup.api/ltp/umask03.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/winsup/testsuite/winsup.api/ltp/umask03.c b/winsup/testsuite/winsup.api/ltp/umask03.c
index 341da7507..84d2f089d 100644
--- a/winsup/testsuite/winsup.api/ltp/umask03.c
+++ b/winsup/testsuite/winsup.api/ltp/umask03.c
@@ -19,7 +19,7 @@
 
 /*
  * NAME
- *	umask01.c
+ *	umask03.c
  *
  * DESCRIPTION
  *	Check that umask changes the mask, and that the previous
@@ -30,7 +30,7 @@
  *	corresponds to the previous value set.
  *
  * USAGE:  <for command-line>
- *		umask01 [-c n] [-i n] [-I x] [-P x] [-t]
+ *		umask03 [-c n] [-i n] [-I x] [-P x] [-t]
  *		where,  -c n : Run n copies concurrently.
  *			-i n : Execute test n times.
  *			-I x : Execute test for x seconds.
@@ -51,7 +51,7 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 
-const char *TCID = "umask01";
+const char *TCID = "umask03";
 int TST_TOTAL = 1;
 extern int Tst_count;
 
@@ -68,6 +68,7 @@ main(int argc, char **argv)
 	
 	struct stat statbuf;
 	unsigned mskval = 0000;
+	int failcnt = 0;
 	int fildes, i;
 	unsigned low9mode;
 
@@ -99,12 +100,13 @@ main(int argc, char **argv)
 				} else {
 					low9mode = statbuf.st_mode & 0777;
 					if (low9mode != (~mskval & 0777)) {
-						tst_brkm(TBROK, cleanup,
-							 "got %0 expected %o"
-							 "mask didnot take",
+						tst_resm(TBROK,
+							 "got mode %o expected %o "
+							 "mask %o did not take",
 							 low9mode,
-							 (~mskval & 0777));
-						/*NOTREACHED*/
+							 (~mskval & 0777),
+							 mskval);
+						failcnt++;
 					} else {
 						tst_resm(TPASS, "Test "
 							"condition: %d, umask: "
@@ -114,6 +116,9 @@ main(int argc, char **argv)
 			}
 			close(fildes);
 		}
+		if (!failcnt)
+			tst_resm(TPASS, "umask correctly returns the "
+					"previous value for all masks");
 	}
 	cleanup();
 	/*NOTREACHED*/
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 11/11] Cygwin: testsuite: Drop Adminstrator privileges while running tests
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (9 preceding siblings ...)
  2023-07-13 11:39 ` [PATCH 10/11] Cygwin: testsuite: Minor fixes to umask03 Jon Turney
@ 2023-07-13 11:39 ` Jon Turney
  2023-07-13 18:05 ` [PATCH 00/11] More testsuite fixes Corinna Vinschen
  2023-07-17 11:58 ` Jon Turney
  12 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:39 UTC (permalink / raw)
  To: cygwin-patches; +Cc: Jon Turney

Test access05 and symlink03 expect operations to fail which succeed when
we have Adminstrator privileges.

There's perhaps a bit of incoherency here: some XFAILed tests expect to
run as root (so maybe we need the ability to selectively cygdrop?).
---
 .github/workflows/cygwin.yml           | 1 +
 winsup/doc/faq-programming.xml         | 5 +++--
 winsup/testsuite/winsup.api/winsup.exp | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
index 248a3e4cd..39553d37a 100644
--- a/.github/workflows/cygwin.yml
+++ b/.github/workflows/cygwin.yml
@@ -73,6 +73,7 @@ jobs:
           automake,
           busybox,
           cocom,
+          cygutils-extra,
           dblatex,
           dejagnu,
           docbook-xml45,
diff --git a/winsup/doc/faq-programming.xml b/winsup/doc/faq-programming.xml
index 15ae6eac4..2c684bb2b 100644
--- a/winsup/doc/faq-programming.xml
+++ b/winsup/doc/faq-programming.xml
@@ -697,8 +697,9 @@ Building these programs can be disabled with the <literal>--without-cross-bootst
 option to <literal>configure</literal>.
 </para>
 
-<!-- If you want to run the tests, <literal>dejagnu</literal> and
-     <literal>busybox</literal> are also required. -->
+<!-- If you want to run the tests, <literal>dejagnu</literal>,
+     <literal>busybox</literal> and <literal>cygutils-extra<literal> are also
+     required. -->
 
 <para>
 Building the documentation also requires the <literal>dblatex</literal>,
diff --git a/winsup/testsuite/winsup.api/winsup.exp b/winsup/testsuite/winsup.api/winsup.exp
index fb3e3816c..111509511 100644
--- a/winsup/testsuite/winsup.api/winsup.exp
+++ b/winsup/testsuite/winsup.api/winsup.exp
@@ -64,7 +64,7 @@ foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c $srcdir/$subdir/*/*.{cc
 	    }
 	    file mkdir $tmpdir/$tmpfile
 	    set env(PATH) "$runtime_root:$env(PATH)"
-	    ws_spawn "$cygrun $exec $testdll_tmpdir/$tmpfile > $redirect_output"
+	    ws_spawn "cygdrop $cygrun $exec $testdll_tmpdir/$tmpfile > $redirect_output"
 	    file delete -force $tmpdir/$tmpfile
 	    set env(PATH) "$orig_path"
 	    if { $rv } {
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-13 11:39 ` [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5 Jon Turney
@ 2023-07-13 11:43   ` Jon Turney
  2023-07-13 18:16   ` Corinna Vinschen
  1 sibling, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-13 11:43 UTC (permalink / raw)
  To: Cygwin Patches

On 13/07/2023 12:39, Jon Turney wrote:
> These tests async thread cancellation of a thread that doesn't have any
> cancellation points.
> 
> Unfortunately, since 450f557f the async cancellation silently fails when
> the thread is inside the kernel function Sleep(), so it just exits
> normally after 10 seconds. (See the commentary in pthread::cancel() in
> thread.cc, where it checks if the target thread is inside the kernel,
> and silently converts the cancellation into a deferred one)
> 
> Work around this by busy-waiting rather than Sleep()ing for 10 seconds.
> 
> This is still somewhat fragile: the async cancel could still fail, if it
> happens to occur while we're inside the kernel function that time()
> calls.
> 
> v2:
> Do nothing more efficiently
> ---
>   winsup/testsuite/winsup.api/pthread/cancel3.c | 24 ++++++++++++++-----
>   winsup/testsuite/winsup.api/pthread/cancel5.c | 24 ++++++++++++++-----
>   2 files changed, 36 insertions(+), 12 deletions(-)
> 

This still seems a bit flaky, for the reasons identified. Perhaps 
there's a better way of doing a pause without a cancellation point or 
entering the kernel? Or a better way to test that async cancellation 
actually works?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/11] More testsuite fixes
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (10 preceding siblings ...)
  2023-07-13 11:39 ` [PATCH 11/11] Cygwin: testsuite: Drop Adminstrator privileges while running tests Jon Turney
@ 2023-07-13 18:05 ` Corinna Vinschen
  2023-07-17 11:58 ` Jon Turney
  12 siblings, 0 replies; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-13 18:05 UTC (permalink / raw)
  To: cygwin-patches

On Jul 13 12:38, Jon Turney wrote:
> This gets us from :
> 
> FAIL: cygload
> FAIL: devdsp.c
> FAIL: ltp/access05.c
> FAIL: ltp/fcntl07.c
> FAIL: ltp/symlink01.c
> FAIL: ltp/symlink03.c
> FAIL: ltp/umask03.c
> FAIL: pthread/cancel11.c
> FAIL: pthread/cancel3.c
> FAIL: pthread/cancel5.c
> FAIL: pthread/inherit1.c
> FAIL: pthread/priority1.c
> FAIL: pthread/priority2.c
> FAIL: systemcall.c
> 
> to:
> 
> FAIL: cygload
> FAIL: devdsp.c
> FAIL: ltp/umask03.c
> FAIL: pthread/cancel11.c
> FAIL: pthread/priority1.c
> 
> Notes on the remaining failures:

Please add Signed-off-by tags to your patches.

Thanks,
Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-13 11:39 ` [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5 Jon Turney
  2023-07-13 11:43   ` Jon Turney
@ 2023-07-13 18:16   ` Corinna Vinschen
  2023-07-13 18:37     ` Corinna Vinschen
  1 sibling, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-13 18:16 UTC (permalink / raw)
  To: cygwin-patches

On Jul 13 12:39, Jon Turney wrote:
> These tests async thread cancellation of a thread that doesn't have any
> cancellation points.
> 
> Unfortunately, since 450f557f the async cancellation silently fails when
> the thread is inside the kernel function Sleep(), so it just exits

I'm not sure how this patch should be the actual culprit.  It only
handles thread priorities, not thread cancellation.  Isn't this rather
2b165a453ea7b or some such?

> normally after 10 seconds. (See the commentary in pthread::cancel() in
> thread.cc, where it checks if the target thread is inside the kernel,
> and silently converts the cancellation into a deferred one)

Nevertheless, I think this is ok to do.  The description of pthread_cancel
contains this:

  Asynchronous cancelability means that the thread can be canceled at
  any time (usually immediately, but the system does not guarantee this).

And

  The above steps happen asynchronously with respect to the
  pthread_cancel() call; the return status of pthread_cancel() merely
  informs the caller whether the cancellation request was successfully
  queued.

So any assumption *when* the cancallation takes place is may be wrong.


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01
  2023-07-13 11:39 ` [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01 Jon Turney
@ 2023-07-13 18:17   ` Corinna Vinschen
  2023-07-14 13:04     ` Jon Turney
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-13 18:17 UTC (permalink / raw)
  To: cygwin-patches

On Jul 13 12:39, Jon Turney wrote:
> See ltp commit 44d51c3f

Can you please add some text which helps the reader to understand the
issue without having to refer to the ltp repo?


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 10/11] Cygwin: testsuite: Minor fixes to umask03
  2023-07-13 11:39 ` [PATCH 10/11] Cygwin: testsuite: Minor fixes to umask03 Jon Turney
@ 2023-07-13 18:18   ` Corinna Vinschen
  0 siblings, 0 replies; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-13 18:18 UTC (permalink / raw)
  To: cygwin-patches

On Jul 13 12:39, Jon Turney wrote:
> See ltp commits f32691e7, 923b23ff and b846e7bb

Ditto?


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-13 18:16   ` Corinna Vinschen
@ 2023-07-13 18:37     ` Corinna Vinschen
  2023-07-13 18:53       ` Corinna Vinschen
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-13 18:37 UTC (permalink / raw)
  To: cygwin-patches

On Jul 13 20:16, Corinna Vinschen wrote:
> On Jul 13 12:39, Jon Turney wrote:
> > These tests async thread cancellation of a thread that doesn't have any
> > cancellation points.
> > 
> > Unfortunately, since 450f557f the async cancellation silently fails when
> > the thread is inside the kernel function Sleep(), so it just exits
> 
> I'm not sure how this patch should be the actual culprit.  It only
> handles thread priorities, not thread cancellation.  Isn't this rather
> 2b165a453ea7b or some such?
> 
> > normally after 10 seconds. (See the commentary in pthread::cancel() in
> > thread.cc, where it checks if the target thread is inside the kernel,
> > and silently converts the cancellation into a deferred one)
> 
> Nevertheless, I think this is ok to do.  The description of pthread_cancel
> contains this:
> 
>   Asynchronous cancelability means that the thread can be canceled at
>   any time (usually immediately, but the system does not guarantee this).
> 
> And
> 
>   The above steps happen asynchronously with respect to the
>   pthread_cancel() call; the return status of pthread_cancel() merely
>   informs the caller whether the cancellation request was successfully
>   queued.
> 
> So any assumption *when* the cancallation takes place is may be wrong.

I wonder, though, if we can't come up with a better solution than just
waiting for the next cancellation point.

No solution comes to mind if the user code calls a Win32 function, but
maybe _sigbe could check if the thread's cancel_event is set?  It's all
in assembler, that complicates it a bit, but that would make it at least
working for POSIX functions which are no cancellation points.


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-13 18:37     ` Corinna Vinschen
@ 2023-07-13 18:53       ` Corinna Vinschen
  2023-07-14 13:04         ` Jon Turney
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-13 18:53 UTC (permalink / raw)
  To: cygwin-patches

On Jul 13 20:37, Corinna Vinschen wrote:
> On Jul 13 20:16, Corinna Vinschen wrote:
> > On Jul 13 12:39, Jon Turney wrote:
> > > These tests async thread cancellation of a thread that doesn't have any
> > > cancellation points.
> > > 
> > > Unfortunately, since 450f557f the async cancellation silently fails when
> > > the thread is inside the kernel function Sleep(), so it just exits
> > 
> > I'm not sure how this patch should be the actual culprit.  It only
> > handles thread priorities, not thread cancellation.  Isn't this rather
> > 2b165a453ea7b or some such?
> > 
> > > normally after 10 seconds. (See the commentary in pthread::cancel() in
> > > thread.cc, where it checks if the target thread is inside the kernel,
> > > and silently converts the cancellation into a deferred one)
> > 
> > Nevertheless, I think this is ok to do.  The description of pthread_cancel
> > contains this:
> > 
> >   Asynchronous cancelability means that the thread can be canceled at
> >   any time (usually immediately, but the system does not guarantee this).
> > 
> > And
> > 
> >   The above steps happen asynchronously with respect to the
> >   pthread_cancel() call; the return status of pthread_cancel() merely
> >   informs the caller whether the cancellation request was successfully
> >   queued.
> > 
> > So any assumption *when* the cancallation takes place is may be wrong.
> 
> I wonder, though, if we can't come up with a better solution than just
> waiting for the next cancellation point.
> 
> No solution comes to mind if the user code calls a Win32 function, but
> maybe _sigbe could check if the thread's cancel_event is set?  It's all
> in assembler, that complicates it a bit, but that would make it at least
> working for POSIX functions which are no cancellation points.

Alternatively, maybe we can utilize the existing signal handler and
just send a Cygwin-only signal outside the maskable signal range.
wait_sig calls sigpacket::process like for any other standard signal,
The signal handler is basically pthread::static_cancel_self().
Something like that.


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-13 18:53       ` Corinna Vinschen
@ 2023-07-14 13:04         ` Jon Turney
  2023-07-14 18:57           ` Corinna Vinschen
  2023-07-17 11:51           ` Jon Turney
  0 siblings, 2 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-14 13:04 UTC (permalink / raw)
  To: Cygwin Patches

On 13/07/2023 19:53, Corinna Vinschen wrote:
> On Jul 13 20:37, Corinna Vinschen wrote:
>> On Jul 13 20:16, Corinna Vinschen wrote:
>>> On Jul 13 12:39, Jon Turney wrote:
>>>> These tests async thread cancellation of a thread that doesn't have any
>>>> cancellation points.
>>>>
>>>> Unfortunately, since 450f557f the async cancellation silently fails when
>>>> the thread is inside the kernel function Sleep(), so it just exits
>>>
>>> I'm not sure how this patch should be the actual culprit.  It only
>>> handles thread priorities, not thread cancellation.  Isn't this rather
>>> 2b165a453ea7b or some such?

Yeah, I messed up there somehow. I will fix the commit id.

>>>> normally after 10 seconds. (See the commentary in pthread::cancel() in
>>>> thread.cc, where it checks if the target thread is inside the kernel,
>>>> and silently converts the cancellation into a deferred one)
>>>
>>> Nevertheless, I think this is ok to do.  The description of pthread_cancel
>>> contains this:
>>>
>>>    Asynchronous cancelability means that the thread can be canceled at
>>>    any time (usually immediately, but the system does not guarantee this).
>>>
>>> And
>>>
>>>    The above steps happen asynchronously with respect to the
>>>    pthread_cancel() call; the return status of pthread_cancel() merely
>>>    informs the caller whether the cancellation request was successfully
>>>    queued.
>>>
>>> So any assumption *when* the cancallation takes place is may be wrong.

Yeah.

I think the flakiness is when we happen to try to async cancel while in 
the Windows kernel, which implicitly converts to a deferred 
cancellation, but there are no cancellation points in the the thread, so 
it arrives at pthread_exit() and returns a exit code other than 
PTHREAD_CANCELED.

I did consider making the test non-flaky by adding a final call to 
pthread_testcancel(), to notice any failed async cancellation which has 
been converted to a deferred one.

But then that is just the same as the deferred cancellation tests, and 
confirms the cancellation happens, but not that it's async, which is 
part of the point of the test.

I guess this could also check that not all of the threads ran for all 10 
seconds, which would indicate that at least some of them were cancelled 
asynchronously.

>> I wonder, though, if we can't come up with a better solution than just
>> waiting for the next cancellation point.
>>
>> No solution comes to mind if the user code calls a Win32 function, but
>> maybe _sigbe could check if the thread's cancel_event is set?  It's all
>> in assembler, that complicates it a bit, but that would make it at least
>> working for POSIX functions which are no cancellation points.

I think you'd need to record the "pending async cancellation" as 
different to "deferred cancellation" so this doesn't turn everything 
into a cancellation point.

> Alternatively, maybe we can utilize the existing signal handler and
> just send a Cygwin-only signal outside the maskable signal range.
> wait_sig calls sigpacket::process like for any other standard signal,
> The signal handler is basically pthread::static_cancel_self().
> Something like that.
I'm not sure this is worth lots of effort, as thread cancellation seems 
to be regarded as mis-specified in such as way as to make it unsafe for 
serious use.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01
  2023-07-13 18:17   ` Corinna Vinschen
@ 2023-07-14 13:04     ` Jon Turney
  0 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-14 13:04 UTC (permalink / raw)
  To: Cygwin Patches

On 13/07/2023 19:17, Corinna Vinschen wrote:
> On Jul 13 12:39, Jon Turney wrote:
>> See ltp commit 44d51c3f
> 
> Can you please add some text which helps the reader to understand the
> issue without having to refer to the ltp repo?

Sure, I added that.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-14 13:04         ` Jon Turney
@ 2023-07-14 18:57           ` Corinna Vinschen
  2023-07-17 11:05             ` Corinna Vinschen
  2023-07-17 11:51           ` Jon Turney
  1 sibling, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-14 18:57 UTC (permalink / raw)
  To: cygwin-patches

On Jul 14 14:04, Jon Turney wrote:
> On 13/07/2023 19:53, Corinna Vinschen wrote:
> > > > Nevertheless, I think this is ok to do.  The description of pthread_cancel
> > > > contains this:
> > > > 
> > > >    Asynchronous cancelability means that the thread can be canceled at
> > > >    any time (usually immediately, but the system does not guarantee this).
> > > > 
> > > > And
> > > > 
> > > >    The above steps happen asynchronously with respect to the
> > > >    pthread_cancel() call; the return status of pthread_cancel() merely
> > > >    informs the caller whether the cancellation request was successfully
> > > >    queued.
> > > > 
> > > > So any assumption *when* the cancallation takes place is may be wrong.
> 
> Yeah.
> 
> I think the flakiness is when we happen to try to async cancel while in the
> Windows kernel, which implicitly converts to a deferred cancellation, but
> there are no cancellation points in the the thread, so it arrives at
> pthread_exit() and returns a exit code other than PTHREAD_CANCELED.

In pthread_join(), right?

> I did consider making the test non-flaky by adding a final call to
> pthread_testcancel(), to notice any failed async cancellation which has been
> converted to a deferred one.
> 
> But then that is just the same as the deferred cancellation tests, and
> confirms the cancellation happens, but not that it's async, which is part of
> the point of the test.

What if Cygwin checks for a deferred cancellation in pthread::exit,
too?  It needs to do this by its own, not calling pthread::testcancel,
otherwise we're in an infinite loop.  Since cancel is basically like
exit, just with a PTHREAD_CANCELED return value, the only additional
action would be to set retval to PTHREAD_CANCELED explicitely.


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-14 18:57           ` Corinna Vinschen
@ 2023-07-17 11:05             ` Corinna Vinschen
  2023-07-17 11:51               ` Jon Turney
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 11:05 UTC (permalink / raw)
  To: Jon Turney; +Cc: cygwin-patches

On Jul 14 20:57, Corinna Vinschen wrote:
> On Jul 14 14:04, Jon Turney wrote:
> > On 13/07/2023 19:53, Corinna Vinschen wrote:
> > > > > Nevertheless, I think this is ok to do.  The description of pthread_cancel
> > > > > contains this:
> > > > > 
> > > > >    Asynchronous cancelability means that the thread can be canceled at
> > > > >    any time (usually immediately, but the system does not guarantee this).
> > > > > 
> > > > > And
> > > > > 
> > > > >    The above steps happen asynchronously with respect to the
> > > > >    pthread_cancel() call; the return status of pthread_cancel() merely
> > > > >    informs the caller whether the cancellation request was successfully
> > > > >    queued.
> > > > > 
> > > > > So any assumption *when* the cancallation takes place is may be wrong.
> > 
> > Yeah.
> > 
> > I think the flakiness is when we happen to try to async cancel while in the
> > Windows kernel, which implicitly converts to a deferred cancellation, but
> > there are no cancellation points in the the thread, so it arrives at
> > pthread_exit() and returns a exit code other than PTHREAD_CANCELED.
> 
> In pthread_join(), right?
> 
> > I did consider making the test non-flaky by adding a final call to
> > pthread_testcancel(), to notice any failed async cancellation which has been
> > converted to a deferred one.
> > 
> > But then that is just the same as the deferred cancellation tests, and
> > confirms the cancellation happens, but not that it's async, which is part of
> > the point of the test.
> 
> What if Cygwin checks for a deferred cancellation in pthread::exit,
> too?  It needs to do this by its own, not calling pthread::testcancel,
> otherwise we're in an infinite loop.  Since cancel is basically like
> exit, just with a PTHREAD_CANCELED return value, the only additional
> action would be to set retval to PTHREAD_CANCELED explicitely.

Kind of like this:

diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
index f614e01c42f6..fceb9bda1806 100644
--- a/winsup/cygwin/thread.cc
+++ b/winsup/cygwin/thread.cc
@@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
   class pthread *thread = this;
   _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
 
+  /* Deferred cancellation still pending? */
+  if (canceled)
+    {
+      WaitForSingleObject (cancel_event, INFINITE);
+      value_ptr = PTHREAD_CANCELED;
+    }
+
   // run cleanup handlers
   pop_all_cleanup_handlers ();
 
What do you think?


Corinna


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-14 13:04         ` Jon Turney
  2023-07-14 18:57           ` Corinna Vinschen
@ 2023-07-17 11:51           ` Jon Turney
  2023-07-17 14:04             ` Corinna Vinschen
  1 sibling, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-17 11:51 UTC (permalink / raw)
  To: Cygwin Patches

[-- Attachment #1: Type: text/plain, Size: 2285 bytes --]

On 14/07/2023 14:04, Jon Turney wrote:
> On 13/07/2023 19:53, Corinna Vinschen wrote:
>>>>> normally after 10 seconds. (See the commentary in pthread::cancel() in
>>>>> thread.cc, where it checks if the target thread is inside the kernel,
>>>>> and silently converts the cancellation into a deferred one)
>>>>
>>>> Nevertheless, I think this is ok to do.  The description of 
>>>> pthread_cancel
>>>> contains this:
>>>>
>>>>    Asynchronous cancelability means that the thread can be canceled at
>>>>    any time (usually immediately, but the system does not guarantee 
>>>> this).
>>>>
>>>> And
>>>>
>>>>    The above steps happen asynchronously with respect to the
>>>>    pthread_cancel() call; the return status of pthread_cancel() merely
>>>>    informs the caller whether the cancellation request was successfully
>>>>    queued.
>>>>
>>>> So any assumption *when* the cancallation takes place is may be wrong.
> 
> Yeah.
> 
> I think the flakiness is when we happen to try to async cancel while in 
> the Windows kernel, which implicitly converts to a deferred 
> cancellation, but there are no cancellation points in the the thread, so 
> it arrives at pthread_exit() and returns a exit code other than 
> PTHREAD_CANCELED.
> 
> I did consider making the test non-flaky by adding a final call to 
> pthread_testcancel(), to notice any failed async cancellation which has 
> been converted to a deferred one.
> 
> But then that is just the same as the deferred cancellation tests, and 
> confirms the cancellation happens, but not that it's async, which is 
> part of the point of the test.
> 
> I guess this could also check that not all of the threads ran for all 10 
> seconds, which would indicate that at least some of them were cancelled 
> asynchronously.

I wrote this, attached, which does indeed make this test more reliable.

You point is well made that this is making assumptions about how quickly 
async cancellation works that are not required by the standard

(It would be a valid, if strange implementation, if async cancellation 
always took 10 seconds to take effect, which this test assumes isn't the 
case)

Perhaps there is a better way to write a test that async cancellation 
works in the absence of cancellation points, but it eludes me...

[-- Attachment #2: 0001-Cygwin-testsuite-Make-cancel3-and-cancel5-more-robus.patch --]
[-- Type: text/plain, Size: 2819 bytes --]

From d0023fa3ea1e8f29e80d473ab13d8200bdd2dc3a Mon Sep 17 00:00:00 2001
From: Jon Turney <jon.turney@dronecode.org.uk>
Date: Sat, 15 Jul 2023 17:57:43 +0100
Subject: [PATCH] Cygwin: testsuite: Make cancel3 and cancel5 more robust

Despite our efforts, sometimes the async cancellation gets deferred.

Notice this by calling pthread_testcancel(), and then try to work out if
async cancellation was ever successful by checking if all threads ran
for the full 10 seconds, or if some were stopped early.
---
 winsup/testsuite/winsup.api/pthread/cancel3.c | 16 +++++++++++++++-
 winsup/testsuite/winsup.api/pthread/cancel5.c | 14 ++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/winsup/testsuite/winsup.api/pthread/cancel3.c b/winsup/testsuite/winsup.api/pthread/cancel3.c
index 07feb7c9b..075f052cc 100644
--- a/winsup/testsuite/winsup.api/pthread/cancel3.c
+++ b/winsup/testsuite/winsup.api/pthread/cancel3.c
@@ -92,6 +92,9 @@ mythread(void * arg)
 	}
     }
 
+  /* Notice if asynchronous cancel got deferred */
+  pthread_testcancel();
+
   return result;
 }
 
@@ -101,6 +104,7 @@ main()
   int failed = 0;
   int i;
   pthread_t t[NUMTHREADS + 1];
+  int ran_to_completion = 0;
 
   assert((t[0] = pthread_self()) != NULL);
 
@@ -130,7 +134,7 @@ main()
    * Standard check that all threads started.
    */
   for (i = 1; i <= NUMTHREADS; i++)
-    { 
+    {
       if (!threadbag[i].started)
 	{
 	  failed |= !threadbag[i].started;
@@ -166,9 +170,19 @@ main()
 		  threadbag[i].count,
 		  result);
 	}
+
+      if (threadbag[i].count >= 10)
+	ran_to_completion++;
+
       failed = (failed || fail);
     }
 
+  if (ran_to_completion >= 10)
+    {
+      fprintf(stderr, "All threads ran to completion, async cancellation never happened\n");
+      failed = TRUE;
+    }
+
   assert(!failed);
 
   /*
diff --git a/winsup/testsuite/winsup.api/pthread/cancel5.c b/winsup/testsuite/winsup.api/pthread/cancel5.c
index 999b3c95c..23c02afe4 100644
--- a/winsup/testsuite/winsup.api/pthread/cancel5.c
+++ b/winsup/testsuite/winsup.api/pthread/cancel5.c
@@ -93,6 +93,9 @@ mythread(void * arg)
 	}
     }
 
+  /* Notice if asynchronous cancel got deferred */
+  pthread_testcancel();
+
   return result;
 }
 
@@ -102,6 +105,7 @@ main()
   int failed = 0;
   int i;
   pthread_t t[NUMTHREADS + 1];
+  int ran_to_completion = 0;
 
   for (i = 1; i <= NUMTHREADS; i++)
     {
@@ -165,9 +169,19 @@ main()
 		  threadbag[i].count,
 		  result);
 	}
+
+      if (threadbag[i].count >= 10)
+       ran_to_completion++;
+
       failed = (failed || fail);
     }
 
+  if (ran_to_completion >= 10)
+    {
+      fprintf(stderr, "All threads ran to completion, async cancellation never happened\n");
+      failed = TRUE;
+    }
+
   assert(!failed);
 
   /*
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 11:05             ` Corinna Vinschen
@ 2023-07-17 11:51               ` Jon Turney
  2023-07-17 14:21                 ` Corinna Vinschen
  0 siblings, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-17 11:51 UTC (permalink / raw)
  To: Cygwin Patches

On 17/07/2023 12:05, Corinna Vinschen wrote:
> On Jul 14 20:57, Corinna Vinschen wrote:
>> On Jul 14 14:04, Jon Turney wrote:
>>> On 13/07/2023 19:53, Corinna Vinschen wrote:
>>>>>> Nevertheless, I think this is ok to do.  The description of pthread_cancel
>>>>>> contains this:
>>>>>>
>>>>>>     Asynchronous cancelability means that the thread can be canceled at
>>>>>>     any time (usually immediately, but the system does not guarantee this).
>>>>>>
>>>>>> And
>>>>>>
>>>>>>     The above steps happen asynchronously with respect to the
>>>>>>     pthread_cancel() call; the return status of pthread_cancel() merely
>>>>>>     informs the caller whether the cancellation request was successfully
>>>>>>     queued.
>>>>>>
>>>>>> So any assumption *when* the cancallation takes place is may be wrong.
>>>
>>> Yeah.
>>>
>>> I think the flakiness is when we happen to try to async cancel while in the
>>> Windows kernel, which implicitly converts to a deferred cancellation, but
>>> there are no cancellation points in the the thread, so it arrives at
>>> pthread_exit() and returns a exit code other than PTHREAD_CANCELED.
>>
>> In pthread_join(), right?
>>
>>> I did consider making the test non-flaky by adding a final call to
>>> pthread_testcancel(), to notice any failed async cancellation which has been
>>> converted to a deferred one.
>>>
>>> But then that is just the same as the deferred cancellation tests, and
>>> confirms the cancellation happens, but not that it's async, which is part of
>>> the point of the test.
>>
>> What if Cygwin checks for a deferred cancellation in pthread::exit,
>> too?  It needs to do this by its own, not calling pthread::testcancel,
>> otherwise we're in an infinite loop.  Since cancel is basically like
>> exit, just with a PTHREAD_CANCELED return value, the only additional
>> action would be to set retval to PTHREAD_CANCELED explicitely.
> 
> Kind of like this:
> 
> diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> index f614e01c42f6..fceb9bda1806 100644
> --- a/winsup/cygwin/thread.cc
> +++ b/winsup/cygwin/thread.cc
> @@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
>     class pthread *thread = this;
>     _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
>   
> +  /* Deferred cancellation still pending? */
> +  if (canceled)
> +    {
> +      WaitForSingleObject (cancel_event, INFINITE);
> +      value_ptr = PTHREAD_CANCELED;
> +    }
> +
>     // run cleanup handlers
>     pop_all_cleanup_handlers ();
>   
> What do you think?

I mean, by your own interpretation of the standard, this isn't required, 
because we're allowed to take arbitrarily long to deliver the async 
cancellation, and in this case, we took so long that the thread exited 
before it happened, too bad...

It doesn't seem a bad addition, but this turns pthread_exit() into a 
deferred cancellation point as well, so it should be added to the list 
of "an implementation may also mark other functions not specified in the 
standard as cancellation points" in our documentation^W the huge comment 
in threads.cc.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/11] More testsuite fixes
  2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
                   ` (11 preceding siblings ...)
  2023-07-13 18:05 ` [PATCH 00/11] More testsuite fixes Corinna Vinschen
@ 2023-07-17 11:58 ` Jon Turney
  2023-07-17 14:02   ` Corinna Vinschen
  12 siblings, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-17 11:58 UTC (permalink / raw)
  To: Cygwin Patches

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

On 13/07/2023 12:38, Jon Turney wrote:
> 
> cancel11: some funkiness I can't work out, causing the save/restoring signal handlers around system() to not
> work correctly

So, the test here: is the SIGINT handle restored correctly if the thread 
executing system() is cancelled. This test fails, because it's not.

It seems like that scenario was explicitly considered when this test was 
added in https://cygwin.com/pipermail/cygwin-patches/2003q1/003378.html

I think maybe this is a regression introduced in 
https://cygwin.com/cgit/newlib-cygwin/commit/?id=3cb9da14617c58c2821c80d48f0bd80a2deb5fdf

child_info_spawn::worker calls waitpid() which ultimately calls 
cygwait() which notices the thread's cancel event is signalled and acts 
as a cancellation point.

Attached is a patch which adds back the restoration of signal handlers 
on thread cancellation.

I can't find any hints in the mailing lists around 2013-04 about what 
problem that change is fixing, but given the commentary, this might be 
reintroducing another problem, though.

[-- Attachment #2: 0001-Cygwin-Restore-pthread-cleanup-of-signal-handlers-du.patch --]
[-- Type: text/plain, Size: 1641 bytes --]

From a798750d271e20402a0a5efc4ac073f5948ad5b7 Mon Sep 17 00:00:00 2001
From: Jon Turney <jon.turney@dronecode.org.uk>
Date: Sun, 16 Jul 2023 14:46:00 +0100
Subject: [PATCH] Cygwin: Restore pthread cleanup of signal handlers during
 system()

Removed in 3cb9da14 which describes it as 'ill-advised' (additional
context doesn't appear to be available).

We can't neatly tuck the pthread_cleanup_push/pop inside the object, as
they are implemented as macros which must appear in the same lexical
scope.
---
 winsup/cygwin/spawn.cc | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc
index 84dd74e28..3696ac9b5 100644
--- a/winsup/cygwin/spawn.cc
+++ b/winsup/cygwin/spawn.cc
@@ -257,11 +257,15 @@ struct system_call_handle
   }
   ~system_call_handle ()
   {
-    if (is_system_call ())
+  }
+  static void cleanup (void *arg)
+  {
+# define this_ ((system_call_handle *) arg)
+    if (this_->is_system_call ())
       {
-	signal (SIGINT, oldint);
-	signal (SIGQUIT, oldquit);
-	sigprocmask (SIG_SETMASK, &oldmask, NULL);
+	signal (SIGINT, this_->oldint);
+	signal (SIGQUIT, this_->oldquit);
+	sigprocmask (SIG_SETMASK, &(this_->oldmask), NULL);
       }
   }
 # undef cleanup
@@ -912,8 +916,10 @@ child_info_spawn::worker (const char *prog_arg, const char *const *argv,
 	case _P_WAIT:
 	case _P_SYSTEM:
 	  system_call.arm ();
+	  pthread_cleanup_push (system_call_handle::cleanup, &system_call);
 	  if (waitpid (cygpid, &res, 0) != cygpid)
 	    res = -1;
+	  pthread_cleanup_pop (1);
 	  term_spawn_worker.cleanup ();
 	  break;
 	case _P_DETACH:
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/11] More testsuite fixes
  2023-07-17 11:58 ` Jon Turney
@ 2023-07-17 14:02   ` Corinna Vinschen
  2023-07-18 13:37     ` Jon Turney
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 14:02 UTC (permalink / raw)
  To: cygwin-patches

Hi Jon,

On Jul 17 12:58, Jon Turney wrote:
> On 13/07/2023 12:38, Jon Turney wrote:
> > 
> > cancel11: some funkiness I can't work out, causing the save/restoring signal handlers around system() to not
> > work correctly
> 
> So, the test here: is the SIGINT handle restored correctly if the thread
> executing system() is cancelled. This test fails, because it's not.
> 
> It seems like that scenario was explicitly considered when this test was
> added in https://cygwin.com/pipermail/cygwin-patches/2003q1/003378.html
> 
> I think maybe this is a regression introduced in https://cygwin.com/cgit/newlib-cygwin/commit/?id=3cb9da14617c58c2821c80d48f0bd80a2deb5fdf
> 
> child_info_spawn::worker calls waitpid() which ultimately calls cygwait()
> which notices the thread's cancel event is signalled and acts as a
> cancellation point.
> 
> Attached is a patch which adds back the restoration of signal handlers on
> thread cancellation.
> 
> I can't find any hints in the mailing lists around 2013-04 about what
> problem that change is fixing, but given the commentary, this might be
> reintroducing another problem, though.

Maybe, but the patch is slim enough so we might get away with it.

> From a798750d271e20402a0a5efc4ac073f5948ad5b7 Mon Sep 17 00:00:00 2001
> From: Jon Turney <jon.turney@dronecode.org.uk>
> Date: Sun, 16 Jul 2023 14:46:00 +0100
> Subject: [PATCH] Cygwin: Restore pthread cleanup of signal handlers during
>  system()
> 
> Removed in 3cb9da14 which describes it as 'ill-advised' (additional
> context doesn't appear to be available).

Actually, "ill-conceived".  Beats my why, though.

> We can't neatly tuck the pthread_cleanup_push/pop inside the object, as
> they are implemented as macros which must appear in the same lexical
> scope.

You could do it if you call the underlying functions instead.
pthread_cleanup_push is just a convenience macro which initializes a
local __pthread_cleanup_handler, see include/pthread.h.  If you add a
__pthread_cleanup_handler to system_call_handle, you could use it the
same way as the macro and encapsulate the whole thing inside the object.
If you want to...

Fixes and Signed-off-by tags?


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 11:51           ` Jon Turney
@ 2023-07-17 14:04             ` Corinna Vinschen
  2023-07-17 14:22               ` Corinna Vinschen
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 14:04 UTC (permalink / raw)
  To: cygwin-patches

On Jul 17 12:51, Jon Turney wrote:
> On 14/07/2023 14:04, Jon Turney wrote:
> > On 13/07/2023 19:53, Corinna Vinschen wrote:
> > > > > > normally after 10 seconds. (See the commentary in pthread::cancel() in
> > > > > > thread.cc, where it checks if the target thread is inside the kernel,
> > > > > > and silently converts the cancellation into a deferred one)
> > > > > 
> > > > > Nevertheless, I think this is ok to do.  The description of
> > > > > pthread_cancel
> > > > > contains this:
> > > > > 
> > > > >    Asynchronous cancelability means that the thread can be canceled at
> > > > >    any time (usually immediately, but the system does not
> > > > > guarantee this).
> > > > > 
> > > > > And
> > > > > 
> > > > >    The above steps happen asynchronously with respect to the
> > > > >    pthread_cancel() call; the return status of pthread_cancel() merely
> > > > >    informs the caller whether the cancellation request was successfully
> > > > >    queued.
> > > > > 
> > > > > So any assumption *when* the cancallation takes place is may be wrong.
> > 
> > Yeah.
> > 
> > I think the flakiness is when we happen to try to async cancel while in
> > the Windows kernel, which implicitly converts to a deferred
> > cancellation, but there are no cancellation points in the the thread, so
> > it arrives at pthread_exit() and returns a exit code other than
> > PTHREAD_CANCELED.
> > 
> > I did consider making the test non-flaky by adding a final call to
> > pthread_testcancel(), to notice any failed async cancellation which has
> > been converted to a deferred one.
> > 
> > But then that is just the same as the deferred cancellation tests, and
> > confirms the cancellation happens, but not that it's async, which is
> > part of the point of the test.
> > 
> > I guess this could also check that not all of the threads ran for all 10
> > seconds, which would indicate that at least some of them were cancelled
> > asynchronously.
> 
> I wrote this, attached, which does indeed make this test more reliable.
> 
> You point is well made that this is making assumptions about how quickly
> async cancellation works that are not required by the standard
> 
> (It would be a valid, if strange implementation, if async cancellation
> always took 10 seconds to take effect, which this test assumes isn't the
> case)
> 
> Perhaps there is a better way to write a test that async cancellation works
> in the absence of cancellation points, but it eludes me...

Same here, so just go ahead.

Thanks,
Corinna


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 11:51               ` Jon Turney
@ 2023-07-17 14:21                 ` Corinna Vinschen
  2023-07-17 15:41                   ` Corinna Vinschen
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 14:21 UTC (permalink / raw)
  To: cygwin-patches

On Jul 17 12:51, Jon Turney wrote:
> On 17/07/2023 12:05, Corinna Vinschen wrote:
> > On Jul 14 20:57, Corinna Vinschen wrote:
> > > What if Cygwin checks for a deferred cancellation in pthread::exit,
> > > too?  It needs to do this by its own, not calling pthread::testcancel,
> > > otherwise we're in an infinite loop.  Since cancel is basically like
> > > exit, just with a PTHREAD_CANCELED return value, the only additional
> > > action would be to set retval to PTHREAD_CANCELED explicitely.
> > 
> > Kind of like this:
> > 
> > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> > index f614e01c42f6..fceb9bda1806 100644
> > --- a/winsup/cygwin/thread.cc
> > +++ b/winsup/cygwin/thread.cc
> > @@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
> >     class pthread *thread = this;
> >     _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
> > +  /* Deferred cancellation still pending? */
> > +  if (canceled)
> > +    {
> > +      WaitForSingleObject (cancel_event, INFINITE);
> > +      value_ptr = PTHREAD_CANCELED;
> > +    }
> > +
> >     // run cleanup handlers
> >     pop_all_cleanup_handlers ();
> > What do you think?
> 
> I mean, by your own interpretation of the standard, this isn't required,
> because we're allowed to take arbitrarily long to deliver the async
> cancellation, and in this case, we took so long that the thread exited
> before it happened, too bad...

True enough!

> It doesn't seem a bad addition,

On second thought...

One thing bugging me is this:

Looking into pthread::cancel we have this order of things:

    // cancel deferred
    mutex.unlock ();
    canceled = true;
    SetEvent (cancel_event);
    return 0; 

The canceled var is set before the SetEvent call.
What if the thread is terminated after canceled is set to true but
before SetEvent is called?

pthread::testcancel claims:

  We check for the canceled flag first. [...]
  Only if the thread is marked as canceled, we wait for cancel_event
  being really set, on the off-chance that pthread_cancel gets
  interrupted before calling SetEvent.

Neat idea to speed up the code, but doesn't that mean we have a
potential deadlock, especially given that pthread::testcancel calls WFSO
with an INFINITE timeout?

And if so, how do we fix this?  Theoretically, the most simple
solution might be to call SetEvent before setting the canceled
variable, but in fact we would have to make setting canceld
and cancel_event an atomic operation.

Another idea is never to wait for an INFINITE time.  Logically, if
canceled is set and cancel_event isn't, the thread just hasn't been
canceled yet.

Any better idea?

> but this turns pthread_exit() into a
> deferred cancellation point as well, so it should be added to the list of
> "an implementation may also mark other functions not specified in the
> standard as cancellation points" in our documentation^W the huge comment in
> threads.cc.

Yes, indeed.


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 14:04             ` Corinna Vinschen
@ 2023-07-17 14:22               ` Corinna Vinschen
  0 siblings, 0 replies; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 14:22 UTC (permalink / raw)
  To: cygwin-patches

On Jul 17 16:04, Corinna Vinschen wrote:
> On Jul 17 12:51, Jon Turney wrote:
> > Perhaps there is a better way to write a test that async cancellation works
> > in the absence of cancellation points, but it eludes me...
> 
> Same here, so just go ahead.

Actually, it's not just that.  I think this is the right thing to do.


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 14:21                 ` Corinna Vinschen
@ 2023-07-17 15:41                   ` Corinna Vinschen
  2023-07-17 18:23                     ` Corinna Vinschen
  2023-07-18 11:20                     ` Jon Turney
  0 siblings, 2 replies; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 15:41 UTC (permalink / raw)
  To: cygwin-patches

On Jul 17 16:21, Corinna Vinschen wrote:
> On Jul 17 12:51, Jon Turney wrote:
> > On 17/07/2023 12:05, Corinna Vinschen wrote:
> > > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> > > index f614e01c42f6..fceb9bda1806 100644
> > > --- a/winsup/cygwin/thread.cc
> > > +++ b/winsup/cygwin/thread.cc
> > > @@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
> > >     class pthread *thread = this;
> > >     _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
> > > +  /* Deferred cancellation still pending? */
> > > +  if (canceled)
> > > +    {
> > > +      WaitForSingleObject (cancel_event, INFINITE);
> > > +      value_ptr = PTHREAD_CANCELED;
> > > +    }
> > > +
> > >     // run cleanup handlers
> > >     pop_all_cleanup_handlers ();
> > > What do you think?
> > 
> > I mean, by your own interpretation of the standard, this isn't required,
> > because we're allowed to take arbitrarily long to deliver the async
> > cancellation, and in this case, we took so long that the thread exited
> > before it happened, too bad...
> 
> True enough!
> 
> > It doesn't seem a bad addition,
> 
Actually, it seems we actually *have* to do this.  I just searched
for more info on that problem and, to my surprise, I found this in the
most obvious piece of documentation:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html

Quote:

  As the meaning of the status is determined by the application (except
  when the thread has been canceled, in which case it is
  PTHREAD_CANCELED), [...]

> On second thought...
> 
> One thing bugging me is this:

This is still a bit fuzzy, though.  I'd appreciate any input.

> Looking into pthread::cancel we have this order of things:
> 
>     // cancel deferred
>     mutex.unlock ();
>     canceled = true;
>     SetEvent (cancel_event);
>     return 0; 
> 
> The canceled var is set before the SetEvent call.
> What if the thread is terminated after canceled is set to true but
> before SetEvent is called?
> 
> pthread::testcancel claims:
> 
>   We check for the canceled flag first. [...]
>   Only if the thread is marked as canceled, we wait for cancel_event
>   being really set, on the off-chance that pthread_cancel gets
>   interrupted before calling SetEvent.
> 
> Neat idea to speed up the code, but doesn't that mean we have a
> potential deadlock, especially given that pthread::testcancel calls WFSO
> with an INFINITE timeout?
> 
> And if so, how do we fix this?  Theoretically, the most simple
> solution might be to call SetEvent before setting the canceled
> variable, but in fact we would have to make setting canceld
> and cancel_event an atomic operation.
> 
> Another idea is never to wait for an INFINITE time.  Logically, if
> canceled is set and cancel_event isn't, the thread just hasn't been
> canceled yet.
> 
> Any better idea?
> 
> > but this turns pthread_exit() into a
> > deferred cancellation point as well, so it should be added to the list of
> > "an implementation may also mark other functions not specified in the
> > standard as cancellation points" in our documentation^W the huge comment in
> > threads.cc.
> 
> Yes, indeed.


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 15:41                   ` Corinna Vinschen
@ 2023-07-17 18:23                     ` Corinna Vinschen
  2023-07-18 11:20                     ` Jon Turney
  1 sibling, 0 replies; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-17 18:23 UTC (permalink / raw)
  To: cygwin-patches

On Jul 17 17:41, Corinna Vinschen wrote:
> On Jul 17 16:21, Corinna Vinschen wrote:
> > On Jul 17 12:51, Jon Turney wrote:
> > > On 17/07/2023 12:05, Corinna Vinschen wrote:
> > > > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> > > > index f614e01c42f6..fceb9bda1806 100644
> > > > --- a/winsup/cygwin/thread.cc
> > > > +++ b/winsup/cygwin/thread.cc
> > > > @@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
> > > >     class pthread *thread = this;
> > > >     _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
> > > > +  /* Deferred cancellation still pending? */
> > > > +  if (canceled)
> > > > +    {
> > > > +      WaitForSingleObject (cancel_event, INFINITE);
> > > > +      value_ptr = PTHREAD_CANCELED;
> > > > +    }
> > > > +
> > > >     // run cleanup handlers
> > > >     pop_all_cleanup_handlers ();
> > > > What do you think?
> > > 
> > > I mean, by your own interpretation of the standard, this isn't required,
> > > because we're allowed to take arbitrarily long to deliver the async
> > > cancellation, and in this case, we took so long that the thread exited
> > > before it happened, too bad...
> > 
> > True enough!
> > 
> > > It doesn't seem a bad addition,
> > 
> Actually, it seems we actually *have* to do this.  I just searched
> for more info on that problem and, to my surprise, I found this in the
> most obvious piece of documentation:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html
> 
> Quote:
> 
>   As the meaning of the status is determined by the application (except
>   when the thread has been canceled, in which case it is
>   PTHREAD_CANCELED), [...]

FTR, apparently I have overinterpreted this sentence.

I performed the following crude test on Linux,, the idea being
to call pthread_cancel and then pthread_exit without hitting a
cancallation point in between.

cat > pt.c <<EOF
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdint.h>
#include <unistd.h>
#include <pthread.h>

int marker = 0;

void *
thread (void *arg)
{
  int oldval;

  pthread_setcanceltype (PTHREAD_CANCEL_DEFERRED, &oldval);
  pthread_setcancelstate (PTHREAD_CANCEL_ENABLE, &oldval);
  marker = 1;
  while (marker < 2)
    ;
  pthread_exit ((void *) 42);
}

int
main ()
{
  int error;
  pthread_t pt;
  void *retval;

  if ((error = pthread_create (&pt, NULL, thread, NULL)))
    {
      printf ("pthread_create: %d %s\n", error, strerror (error));
      return 1;
    }
  while (marker < 1)
    ;
  if ((error = pthread_cancel (pt)))
    {
      marker = 2;
      printf ("pthread_cancel: %d %s\n", error, strerror (error));
      pthread_detach (pt);
      return 1;
    }
  marker = 2;
  if ((error = pthread_join (pt, &retval)))
    {
      printf ("pthread_join: %d %s\n", error, strerror (error));
      pthread_detach (pt);
      return 1;
    }
  printf ("retval = %ld (%d)\n", (uintptr_t) retval, retval == PTHREAD_CANCELED);
  return 0;
}
EOF
$ gcc -g -o pt pt.c -lpthread
$ ./pt
retval = 42 (0)

So retval is the one set by the application, not PTHREAD_CANCELED,
despite the pthread_cancel call.  Looks like handling cancellation
inside pthread_exit is really not the right thing to do...


Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-17 15:41                   ` Corinna Vinschen
  2023-07-17 18:23                     ` Corinna Vinschen
@ 2023-07-18 11:20                     ` Jon Turney
  2023-07-18 12:09                       ` Corinna Vinschen
  1 sibling, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-18 11:20 UTC (permalink / raw)
  To: Corinna Vinschen, Cygwin Patches

On 17/07/2023 16:41, Corinna Vinschen wrote:
> On Jul 17 16:21, Corinna Vinschen wrote:
>> On Jul 17 12:51, Jon Turney wrote:
>>> On 17/07/2023 12:05, Corinna Vinschen wrote:
>>>> diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
>>>> index f614e01c42f6..fceb9bda1806 100644
>>>> --- a/winsup/cygwin/thread.cc
>>>> +++ b/winsup/cygwin/thread.cc
>>>> @@ -546,6 +546,13 @@ pthread::exit (void *value_ptr)
>>>>      class pthread *thread = this;
>>>>      _cygtls *tls = cygtls;	/* Save cygtls before deleting this. */
>>>> +  /* Deferred cancellation still pending? */
>>>> +  if (canceled)
>>>> +    {
>>>> +      WaitForSingleObject (cancel_event, INFINITE);
>>>> +      value_ptr = PTHREAD_CANCELED;
>>>> +    }
>>>> +
>>>>      // run cleanup handlers
>>>>      pop_all_cleanup_handlers ();
>>>> What do you think?
>>>
>>> I mean, by your own interpretation of the standard, this isn't required,
>>> because we're allowed to take arbitrarily long to deliver the async
>>> cancellation, and in this case, we took so long that the thread exited
>>> before it happened, too bad...
>>
>> True enough!
>>
>>> It doesn't seem a bad addition,
>>
> Actually, it seems we actually *have* to do this.  I just searched
> for more info on that problem and, to my surprise, I found this in the
> most obvious piece of documentation:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html
> 
> Quote:
> 
>    As the meaning of the status is determined by the application (except
>    when the thread has been canceled, in which case it is
>    PTHREAD_CANCELED), [...]
> 
>> On second thought...
>>
>> One thing bugging me is this:
> 
> This is still a bit fuzzy, though.  I'd appreciate any input.
> 
>> Looking into pthread::cancel we have this order of things:
>>
>>      // cancel deferred
>>      mutex.unlock ();
>>      canceled = true;
>>      SetEvent (cancel_event);
>>      return 0;
>>
>> The canceled var is set before the SetEvent call.
>> What if the thread is terminated after canceled is set to true but
>> before SetEvent is called?
>>
>> pthread::testcancel claims:
>>
>>    We check for the canceled flag first. [...]
>>    Only if the thread is marked as canceled, we wait for cancel_event
>>    being really set, on the off-chance that pthread_cancel gets
>>    interrupted before calling SetEvent.
>>
>> Neat idea to speed up the code, but doesn't that mean we have a
>> potential deadlock, especially given that pthread::testcancel calls WFSO
>> with an INFINITE timeout?

I'm not sure I follow: another thread sets cancelled = true, just before 
we hit pthread::testcancel(), so we go into the WFSO, but then the other 
thread continues, signals cancel_event and everything's fine.

What meaning are you assigning to "interrupted" here?

Are we worried about the thread calling pthread_cancel being cancelled 
itself?

>> And if so, how do we fix this?  Theoretically, the most simple
>> solution might be to call SetEvent before setting the canceled
>> variable, but in fact we would have to make setting canceld
>> and cancel_event an atomic operation.

Well, yeah, that is required for them to be coherent. But we have a 
mutex on the thread object for that purpose, and I don't quite see why 
it's released so early here.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-18 11:20                     ` Jon Turney
@ 2023-07-18 12:09                       ` Corinna Vinschen
  2023-07-18 15:52                         ` Jon Turney
  0 siblings, 1 reply; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-18 12:09 UTC (permalink / raw)
  To: cygwin-patches

On Jul 18 12:20, Jon Turney wrote:
> On 17/07/2023 16:41, Corinna Vinschen wrote:
> > > Looking into pthread::cancel we have this order of things:
> > > 
> > >      // cancel deferred
> > >      mutex.unlock ();
> > >      canceled = true;
> > >      SetEvent (cancel_event);
> > >      return 0;
> > > 
> > > The canceled var is set before the SetEvent call.
> > > What if the thread is terminated after canceled is set to true but
> > > before SetEvent is called?
> > > 
> > > pthread::testcancel claims:
> > > 
> > >    We check for the canceled flag first. [...]
> > >    Only if the thread is marked as canceled, we wait for cancel_event
> > >    being really set, on the off-chance that pthread_cancel gets
> > >    interrupted before calling SetEvent.
> > > 
> > > Neat idea to speed up the code, but doesn't that mean we have a
> > > potential deadlock, especially given that pthread::testcancel calls WFSO
> > > with an INFINITE timeout?
> 
> I'm not sure I follow: another thread sets cancelled = true, just before we
> hit pthread::testcancel(), so we go into the WFSO, but then the other thread
> continues, signals cancel_event and everything's fine.
> 
> What meaning are you assigning to "interrupted" here?
> 
> Are we worried about the thread calling pthread_cancel being cancelled
> itself?

Yes.  My concern is if the thread gets terminated between setting
canceled and setting the event object.

Prior to commit 42faed412857, we didn't wait infinitely, just tested the
event object.  Only with adding the canceled variable, we (better: I)
added the the infinite timeout.

I don't see a real reason to do that.  I think this should be changed
to just checking the event object, see the below patch.

> > > And if so, how do we fix this?  Theoretically, the most simple
> > > solution might be to call SetEvent before setting the canceled
> > > variable, but in fact we would have to make setting canceld
> > > and cancel_event an atomic operation.
> 
> Well, yeah, that is required for them to be coherent. But we have a mutex on
> the thread object for that purpose, and I don't quite see why it's released
> so early here.

The mutex is not guarding canceled or the event object.  Thus it's not
used in testcancel either, otherwise introducing the canceled var to
speed up stuff wouldn't have made any sense.


Corinna


commit 518e5e46f064de41d3ef6d6ef743e2e760a46282
Author:     Corinna Vinschen <corinna@vinschen.de>
AuthorDate: Mon Jul 17 18:02:04 2023 +0200
Commit:     Corinna Vinschen <corinna@vinschen.de>
CommitDate: Tue Jul 18 10:11:30 2023 +0200

    Cygwin: don't wait infinitely on a pthread cancel event
    
    Starting with commit 42faed412857 ("* thread.h (class pthread): Add bool
    member canceled."), pthread::testcancel waits infinitely on cancel_event
    after it checked if the canceled variable is set.  However, this might
    introduce a deadlock, if the thread calling pthread_cancel is terminated
    after setting canceled to true, but before calling SetEvent on cancel_event.
    
    In fact, it's not at all necessary to wait infinitely.  By definition,
    the thread is only canceled if cancel_event is set.  The canceled
    variable is just a helper to speed up code.  We can safely assume that
    the thread hasn't been canceled yet, if canceled is set, but cancel_event
    isn't.
    
    Fixes: 42faed412857 ("* thread.h (class pthread): Add bool member canceled.")
    Signed-off-by: Corinna Vinschen <corinna@vinschen.de>

diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
index f614e01c42f6..21e89e146e0a 100644
--- a/winsup/cygwin/thread.cc
+++ b/winsup/cygwin/thread.cc
@@ -961,12 +961,9 @@ pthread::testcancel ()
      pthread_testcancel function a lot without adding the overhead of
      an OS call.  Only if the thread is marked as canceled, we wait for
      cancel_event being really set, on the off-chance that pthread_cancel
-     gets interrupted before calling SetEvent. */
-  if (canceled)
-    {
-      WaitForSingleObject (cancel_event, INFINITE);
-      cancel_self ();
-    }
+     gets interrupted or terminated before calling SetEvent. */
+  if (canceled && IsEventSignalled (cancel_event))
+    cancel_self ();
 }
 
 /* Return cancel event handle if it exists *and* cancel is not disabled.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/11] More testsuite fixes
  2023-07-17 14:02   ` Corinna Vinschen
@ 2023-07-18 13:37     ` Jon Turney
  2023-07-18 14:52       ` Corinna Vinschen
  0 siblings, 1 reply; 37+ messages in thread
From: Jon Turney @ 2023-07-18 13:37 UTC (permalink / raw)
  To: Cygwin Patches

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

On 17/07/2023 15:02, Corinna Vinschen wrote:
> 
>> We can't neatly tuck the pthread_cleanup_push/pop inside the object, as
>> they are implemented as macros which must appear in the same lexical
>> scope.
> 
> You could do it if you call the underlying functions instead.
> pthread_cleanup_push is just a convenience macro which initializes a
> local __pthread_cleanup_handler, see include/pthread.h.  If you add a
> __pthread_cleanup_handler to system_call_handle, you could use it the
> same way as the macro and encapsulate the whole thing inside the object.
> If you want to...

Good point.

Yeah, this seems preferable as it doesn't move the point where we 
restore the signal handlers in the normal flow of execution, which might 
be important, still happening when the system_call_handle object falls 
out of scope and is destroyed.

> 
> Fixes and Signed-off-by tags?
> 

Done.  Revised patch attached.


[-- Attachment #2: 0001-Cygwin-Restore-signal-handlers-on-thread-cancellatio.patch --]
[-- Type: text/plain, Size: 2347 bytes --]

From 18ddda696137106eaa397a01bc06fc97c59df02d Mon Sep 17 00:00:00 2001
From: Jon Turney <jon.turney@dronecode.org.uk>
Date: Sun, 16 Jul 2023 14:46:00 +0100
Subject: [PATCH] Cygwin: Restore signal handlers on thread cancellation during
 system()

Add back the restoration of signal handlers modified during system() on
thread cancellation.

Removed in 3cb9da14 which describes it as 'ill-conceived' (additional
context doesn't appear to be available).

We use internal implementation helpers for pthread cleanup chain, so we
can neatly tuck it inside the object, and keep the point when we restore
the signal handlers the same. The pthread_cleanup_push/pop() functions
are implemented as macros which must appear in the same lexical scope.)

Fixes: 3cb9da14617c ("Put signals on hold and use system_call_cleanup class to set and restore signals rather than doing it prior to to running the program.  Remove the ill-conceived pthread_cleanup stuff.")
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
---
 winsup/cygwin/spawn.cc | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc
index 84dd74e28..c16fe269a 100644
--- a/winsup/cygwin/spawn.cc
+++ b/winsup/cygwin/spawn.cc
@@ -228,6 +228,8 @@ struct system_call_handle
   _sig_func_ptr oldint;
   _sig_func_ptr oldquit;
   sigset_t oldmask;
+  __pthread_cleanup_handler cleanup_handler;
+
   bool is_system_call ()
   {
     return oldint != ILLEGAL_SIG_FUNC_PTR;
@@ -253,18 +255,27 @@ struct system_call_handle
 	sigaddset (&child_block, SIGCHLD);
 	sigprocmask (SIG_BLOCK, &child_block, &oldmask);
 	sig_send (NULL, __SIGNOHOLD);
+
+	cleanup_handler = { system_call_handle::cleanup, this, NULL };
+	_pthread_cleanup_push (&cleanup_handler);
       }
   }
   ~system_call_handle ()
   {
     if (is_system_call ())
+      _pthread_cleanup_pop (1);
+  }
+  static void cleanup (void *arg)
+  {
+# define this_ ((system_call_handle *) arg)
+    if (this_->is_system_call ())
       {
-	signal (SIGINT, oldint);
-	signal (SIGQUIT, oldquit);
-	sigprocmask (SIG_SETMASK, &oldmask, NULL);
+	signal (SIGINT, this_->oldint);
+	signal (SIGQUIT, this_->oldquit);
+	sigprocmask (SIG_SETMASK, &(this_->oldmask), NULL);
       }
   }
-# undef cleanup
+# undef this_
 };
 
 child_info_spawn NO_COPY ch_spawn;
-- 
2.39.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/11] More testsuite fixes
  2023-07-18 13:37     ` Jon Turney
@ 2023-07-18 14:52       ` Corinna Vinschen
  0 siblings, 0 replies; 37+ messages in thread
From: Corinna Vinschen @ 2023-07-18 14:52 UTC (permalink / raw)
  To: cygwin-patches

On Jul 18 14:37, Jon Turney wrote:
> On 17/07/2023 15:02, Corinna Vinschen wrote:
> > 
> > > We can't neatly tuck the pthread_cleanup_push/pop inside the object, as
> > > they are implemented as macros which must appear in the same lexical
> > > scope.
> > 
> > You could do it if you call the underlying functions instead.
> > pthread_cleanup_push is just a convenience macro which initializes a
> > local __pthread_cleanup_handler, see include/pthread.h.  If you add a
> > __pthread_cleanup_handler to system_call_handle, you could use it the
> > same way as the macro and encapsulate the whole thing inside the object.
> > If you want to...
> 
> Good point.
> 
> Yeah, this seems preferable as it doesn't move the point where we restore
> the signal handlers in the normal flow of execution, which might be
> important, still happening when the system_call_handle object falls out of
> scope and is destroyed.
> 
> > 
> > Fixes and Signed-off-by tags?
> > 
> 
> Done.  Revised patch attached.

Great, looks good!


Thanks,
Corinna

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5
  2023-07-18 12:09                       ` Corinna Vinschen
@ 2023-07-18 15:52                         ` Jon Turney
  0 siblings, 0 replies; 37+ messages in thread
From: Jon Turney @ 2023-07-18 15:52 UTC (permalink / raw)
  To: Cygwin Patches

On 18/07/2023 13:09, Corinna Vinschen wrote:
> On Jul 18 12:20, Jon Turney wrote:
>> On 17/07/2023 16:41, Corinna Vinschen wrote:
>>>> Looking into pthread::cancel we have this order of things:
>>>>
>>>>       // cancel deferred
>>>>       mutex.unlock ();
>>>>       canceled = true;
>>>>       SetEvent (cancel_event);
>>>>       return 0;
>>>>
>>>> The canceled var is set before the SetEvent call.
>>>> What if the thread is terminated after canceled is set to true but
>>>> before SetEvent is called?
>>>>
>>>> pthread::testcancel claims:
>>>>
>>>>     We check for the canceled flag first. [...]
>>>>     Only if the thread is marked as canceled, we wait for cancel_event
>>>>     being really set, on the off-chance that pthread_cancel gets
>>>>     interrupted before calling SetEvent.
>>>>
>>>> Neat idea to speed up the code, but doesn't that mean we have a
>>>> potential deadlock, especially given that pthread::testcancel calls WFSO
>>>> with an INFINITE timeout?
>>
>> I'm not sure I follow: another thread sets cancelled = true, just before we
>> hit pthread::testcancel(), so we go into the WFSO, but then the other thread
>> continues, signals cancel_event and everything's fine.
>>
>> What meaning are you assigning to "interrupted" here?
>>
>> Are we worried about the thread calling pthread_cancel being cancelled
>> itself?
> 
> Yes.  My concern is if the thread gets terminated between setting
> canceled and setting the event object.
> 
> Prior to commit 42faed412857, we didn't wait infinitely, just tested the
> event object.  Only with adding the canceled variable, we (better: I)
> added the the infinite timeout.
> 
> I don't see a real reason to do that.  I think this should be changed
> to just checking the event object, see the below patch.

I see now.  Yes, this makes perfect sense.

>>>> And if so, how do we fix this?  Theoretically, the most simple
>>>> solution might be to call SetEvent before setting the canceled
>>>> variable, but in fact we would have to make setting canceld
>>>> and cancel_event an atomic operation.
>>
>> Well, yeah, that is required for them to be coherent. But we have a mutex on
>> the thread object for that purpose, and I don't quite see why it's released
>> so early here.
> 
> The mutex is not guarding canceled or the event object.  Thus it's not
> used in testcancel either, otherwise introducing the canceled var to
> speed up stuff wouldn't have made any sense.
> 
> 
> Corinna
> 
> 
> commit 518e5e46f064de41d3ef6d6ef743e2e760a46282
> Author:     Corinna Vinschen <corinna@vinschen.de>
> AuthorDate: Mon Jul 17 18:02:04 2023 +0200
> Commit:     Corinna Vinschen <corinna@vinschen.de>
> CommitDate: Tue Jul 18 10:11:30 2023 +0200
> 
>      Cygwin: don't wait infinitely on a pthread cancel event
>      
>      Starting with commit 42faed412857 ("* thread.h (class pthread): Add bool
>      member canceled."), pthread::testcancel waits infinitely on cancel_event
>      after it checked if the canceled variable is set.  However, this might
>      introduce a deadlock, if the thread calling pthread_cancel is terminated
>      after setting canceled to true, but before calling SetEvent on cancel_event.
>      
>      In fact, it's not at all necessary to wait infinitely.  By definition,
>      the thread is only canceled if cancel_event is set.  The canceled
>      variable is just a helper to speed up code.  We can safely assume that
>      the thread hasn't been canceled yet, if canceled is set, but cancel_event
>      isn't.
>      
>      Fixes: 42faed412857 ("* thread.h (class pthread): Add bool member canceled.")
>      Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
> 
> diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> index f614e01c42f6..21e89e146e0a 100644
> --- a/winsup/cygwin/thread.cc
> +++ b/winsup/cygwin/thread.cc
> @@ -961,12 +961,9 @@ pthread::testcancel ()
>        pthread_testcancel function a lot without adding the overhead of
>        an OS call.  Only if the thread is marked as canceled, we wait for
>        cancel_event being really set, on the off-chance that pthread_cancel
> -     gets interrupted before calling SetEvent. */
> -  if (canceled)
> -    {
> -      WaitForSingleObject (cancel_event, INFINITE);
> -      cancel_self ();
> -    }
> +     gets interrupted or terminated before calling SetEvent. */
> +  if (canceled && IsEventSignalled (cancel_event))
> +    cancel_self ();
>   }
>   
>   /* Return cancel event handle if it exists *and* cancel is not disabled.
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-07-18 15:52 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13 11:38 [PATCH 00/11] More testsuite fixes Jon Turney
2023-07-13 11:38 ` [PATCH 01/11] Cygwin: testsuite: Setup test prereqs in 'installation' the tests run in Jon Turney
2023-07-13 11:38 ` [PATCH 02/11] Cygwin: testsuite: Add a simple timeout mechanism Jon Turney
2023-07-13 11:38 ` [PATCH 03/11] Cygwin: testsuite: Remove const from writable string in fcntl07b Jon Turney
2023-07-13 11:38 ` [PATCH 04/11] Cygwin: testsuite: Skip devdsp test when no audio devices present Jon Turney
2023-07-13 11:38 ` [PATCH 05/11] Cygwin: testsuite: Just log result of second open of /dev/dsp Jon Turney
2023-07-13 11:38 ` [PATCH 06/11] Cygwin: testsuite: Also check direct call in systemcall Jon Turney
2023-07-13 11:39 ` [PATCH 07/11] Cygwin: testsuite: Fix for limited thread priority values Jon Turney
2023-07-13 11:39 ` [PATCH 08/11] Cygwin: testsuite: Busy-wait in cancel3 and cancel5 Jon Turney
2023-07-13 11:43   ` Jon Turney
2023-07-13 18:16   ` Corinna Vinschen
2023-07-13 18:37     ` Corinna Vinschen
2023-07-13 18:53       ` Corinna Vinschen
2023-07-14 13:04         ` Jon Turney
2023-07-14 18:57           ` Corinna Vinschen
2023-07-17 11:05             ` Corinna Vinschen
2023-07-17 11:51               ` Jon Turney
2023-07-17 14:21                 ` Corinna Vinschen
2023-07-17 15:41                   ` Corinna Vinschen
2023-07-17 18:23                     ` Corinna Vinschen
2023-07-18 11:20                     ` Jon Turney
2023-07-18 12:09                       ` Corinna Vinschen
2023-07-18 15:52                         ` Jon Turney
2023-07-17 11:51           ` Jon Turney
2023-07-17 14:04             ` Corinna Vinschen
2023-07-17 14:22               ` Corinna Vinschen
2023-07-13 11:39 ` [PATCH 09/11] Cygwin: testsuite: Fix a buffer overflow in symlink01 Jon Turney
2023-07-13 18:17   ` Corinna Vinschen
2023-07-14 13:04     ` Jon Turney
2023-07-13 11:39 ` [PATCH 10/11] Cygwin: testsuite: Minor fixes to umask03 Jon Turney
2023-07-13 18:18   ` Corinna Vinschen
2023-07-13 11:39 ` [PATCH 11/11] Cygwin: testsuite: Drop Adminstrator privileges while running tests Jon Turney
2023-07-13 18:05 ` [PATCH 00/11] More testsuite fixes Corinna Vinschen
2023-07-17 11:58 ` Jon Turney
2023-07-17 14:02   ` Corinna Vinschen
2023-07-18 13:37     ` Jon Turney
2023-07-18 14:52       ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).