public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
* Fibers and cygtls
@ 2020-09-04 15:13 David McFarland
  2020-09-04 19:51 ` Corinna Vinschen
  2020-09-06 14:19 ` Jon Turney
  0 siblings, 2 replies; 6+ messages in thread
From: David McFarland @ 2020-09-04 15:13 UTC (permalink / raw)
  To: cygwin-developers

For a while now I've been trying to get nix working on Cygwin. It
compiles fine, but when it's run, it will exit unexpectedly in a way
that doesn't return an error code to Cygwin, and doesn't break in gdb.

I tracked this to its use of boost coroutines, which use boost-context
to switch fibers. boost-context can be built using either a custom asm
implementation, or using windows fibers (CreateFiber).

Both the boost-context asm implementation and win32 fibers switch the
NT_TIB StackBase and StackLimit fields (to a user provided buffer) when
switching fibers. StackBase is used to find the cygtls for the current
thread (in _my_tls, and with asm in gendef etc). Because of this, A
program will crash when any syscall is made from a fiber.

I searched for any discussion about this, and couldn't find much.
There's this reference to fibers being broken on Cygwin:

https://schneide.blog/2016/09/19/c-coroutines-on-windows-with-the-fiber-api/

This is a test program which shows the problem using win32 fibers:

===File test.c==============
#include <windows.h>
#include <stdio.h>
#include <unistd.h>

DWORD tls_slot;

#define LOG(x) fprintf(stderr, x " %p %u %u %p %p\n", \
        GetCurrentFiber(), \
        (unsigned)GetCurrentThreadId(), \
        (unsigned)tls_slot, \
        TlsGetValue(tls_slot), \
        NtCurrentTeb());

VOID WINAPI proc(LPVOID parent) {
        LOG("b");
        SwitchToFiber(parent);
}

int main() {
        tls_slot = TlsAlloc();
        TlsSetValue(tls_slot, (void*)0x123);
        LOG("a");
        LPVOID self = ConvertThreadToFiber(0);
        LPVOID fiber = CreateFiber(16 * 1024, &proc, self);
        SwitchToFiber(fiber);
        LOG("c");
        return 0;
}
============================================================

My first idea was to find an unused spot in the TEB to store the cygtls
pointer. This worked, but I couldn't find a field (e.g. user reserved,
or 'spare' bytes), that was safe from being stomped by win32 libs.

The only safe implementation I could think of was to use the win32 TLS
APIs to store the pointer. TlsAlloc is already used in dll_entry to
create a slot for thread entry points. I decided to try reusing that
slot to store the cygtls pointer.

This is my WIP implementation:

===File 0001-Cygwin-store-tls-pointer-in-win32-tls.patch===
From 7a61b556575f03e3e99116dc355aa5999b28c855 Mon Sep 17 00:00:00 2001
From: David McFarland <corngood@gmail.com>
Date: Fri, 4 Sep 2020 10:15:57 -0300
Subject: [PATCH] Cygwin: store tls pointer in win32 tls

---
 winsup/cygwin/cygtls.cc               | 15 +++++++++++++++
 winsup/cygwin/cygtls.h                |  6 +++++-
 winsup/cygwin/dcrt0.cc                |  1 +
 winsup/cygwin/fork.cc                 |  1 +
 winsup/cygwin/gendef                  | 15 ++++++++++-----
 winsup/cygwin/include/cygwin/config.h |  2 +-
 winsup/cygwin/init.cc                 | 12 ++++++++----
 winsup/cygwin/miscfuncs.cc            |  2 ++
 8 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/winsup/cygwin/cygtls.cc b/winsup/cygwin/cygtls.cc
index 1a2213d1f..806f6516f 100644
--- a/winsup/cygwin/cygtls.cc
+++ b/winsup/cygwin/cygtls.cc
@@ -16,6 +16,20 @@ details. */
 #include "sigproc.h"
 #include "exception.h"
 
+extern DWORD cygtls_slot;
+
+void _set_tls(TEB *teb)
+{
+  TlsSetValue(cygtls_slot, teb->Tib.StackBase);
+}
+
+_cygtls* _current_tls()
+{
+  register void *ret;
+  __asm __volatile__ ("movl cygtls_slot(%%rip),%%r10d\nmovq %%gs:0x1480(,%%r10d,8),%0" : "=r" (ret) : : "r10");
+  return (_cygtls *) ((PBYTE) ret - CYGTLS_PADSIZE);
+}
+
 /* Two calls to get the stack right... */
 void
 _cygtls::call (DWORD (*func) (void *, void *), void *arg)
@@ -24,6 +38,7 @@ _cygtls::call (DWORD (*func) (void *, void *), void *arg)
   /* Initialize this thread's ability to respond to things like
      SIGSEGV or SIGFPE. */
   exception protect;
+  _set_tls();
   _my_tls.call2 (func, arg, buf);
 }
 
diff --git a/winsup/cygwin/cygtls.h b/winsup/cygwin/cygtls.h
index a2e3676fc..c5165818f 100644
--- a/winsup/cygwin/cygtls.h
+++ b/winsup/cygwin/cygtls.h
@@ -286,7 +286,11 @@ private:
 #include "cygerrno.h"
 #include "ntdll.h"
 
-#define _my_tls (*((_cygtls *) ((PBYTE) NtCurrentTeb()->Tib.StackBase - CYGTLS_PADSIZE)))
+void _set_tls(TEB*);
+inline void _set_tls() { _set_tls(NtCurrentTeb()); }
+_cygtls* _current_tls();
+
+#define _my_tls (*_current_tls())
 extern _cygtls *_main_tls;
 extern _cygtls *_sig_tls;
 
diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
index 810017956..634404573 100644
--- a/winsup/cygwin/dcrt0.cc
+++ b/winsup/cygwin/dcrt0.cc
@@ -475,6 +475,7 @@ child_info_fork::alloc_stack ()
 	 StackBase in the child to be the same as in the parent, so that the
 	 computation of _my_tls is correct. */
       teb->Tib.StackBase = (PVOID) stackbase;
+      _set_tls(teb);
     }
 }
 
diff --git a/winsup/cygwin/fork.cc b/winsup/cygwin/fork.cc
index 7c07b062e..d5cb0fff3 100644
--- a/winsup/cygwin/fork.cc
+++ b/winsup/cygwin/fork.cc
@@ -171,6 +171,7 @@ frok::child (volatile char * volatile here)
 		myself->pid, myself->ppid, __builtin_frame_address (0));
   sigproc_printf ("hParent %p, load_dlls %d", hParent, load_dlls);
 
+  _set_tls();
   /* Make sure threadinfo information is properly set up. */
   if (&_my_tls != _main_tls)
     {
diff --git a/winsup/cygwin/gendef b/winsup/cygwin/gendef
index 431ec67d3..07bf8e200 100755
--- a/winsup/cygwin/gendef
+++ b/winsup/cygwin/gendef
@@ -130,7 +130,8 @@ EOF
 	.seh_proc _sigfe_maybe
 _sigfe_maybe:					# stack is aligned on entry!
 	.seh_endprologue
-	movq	%gs:8,%r10			# location of bottom of stack
+	movl	cygtls_slot(%rip),%r10d
+	movq	%gs:0x1480(,%r10d,8),%r10			# location of bottom of stack
 	leaq	$tls::initialized(%r10),%r11	# where we will be looking
 	cmpq	%r11,%rsp			# stack loc > than tls
 	jge	0f				# yep.  we don't have a tls.
@@ -143,7 +144,8 @@ _sigfe_maybe:					# stack is aligned on entry!
 	.seh_proc _sigfe
 _sigfe:						# stack is aligned on entry!
 	.seh_endprologue
-	movq	%gs:8,%r10			# location of bottom of stack
+	movl	cygtls_slot(%rip),%r10d
+	movq	%gs:0x1480(,%r10d,8),%r10			# location of bottom of stack
 1:	movl	\$1,%r11d			# potential lock value
 	xchgl	%r11d,$tls::stacklock(%r10)	# see if we can grab it
 	movl	%r11d,$tls::spinning(%r10)	# flag if we are waiting for lock
@@ -167,7 +169,8 @@ _sigfe:						# stack is aligned on entry!
 _sigbe:						# return here after cygwin syscall
 						# stack is aligned on entry!
 	.seh_endprologue
-	movq	%gs:8,%r10			# address of bottom of tls
+	movl	cygtls_slot(%rip),%r10d
+	movq	%gs:0x1480(,%r10d,8),%r10			# location of bottom of stack
 1:	movl	\$1,%r11d			# potential lock value
 	xchgl	%r11d,$tls::stacklock(%r10)	# see if we can grab it
 	movl	%r11d,$tls::spinning(%r10)	# flag if we are waiting for lock
@@ -247,7 +250,8 @@ sigdelayed:
 	movdqa	%xmm0,0x20(%rsp)
 	.seh_endprologue
 
-	movq	%gs:8,%r12			# get tls
+	movl	cygtls_slot(%rip),%r12d
+	movq	%gs:0x1480(,%r12d,8),%r12			# location of bottom of stack
 	movl	$tls::saved_errno(%r12),%r15d	# temporarily save saved_errno
 	movq	\$$tls::start_offset,%rcx	# point to beginning of tls block
 	addq	%r12,%rcx			#  and store as first arg to method
@@ -368,7 +372,8 @@ stabilize_sig_stack:
 	subq	\$0x20,%rsp
 	.seh_stackalloc 32
 	.seh_endprologue
-	movq	%gs:8,%r12
+	movl	cygtls_slot(%rip),%r12d
+	movq	%gs:0x1480(,%r12d,8),%r12			# location of bottom of stack
 1:	movl	\$1,%r10d
 	xchgl	%r10d,$tls::stacklock(%r12)
 	movl	%r10d,$tls::spinning(%r12)	# flag if we are waiting for lock
diff --git a/winsup/cygwin/include/cygwin/config.h b/winsup/cygwin/include/cygwin/config.h
index 282637b67..498d6fdc6 100644
--- a/winsup/cygwin/include/cygwin/config.h
+++ b/winsup/cygwin/include/cygwin/config.h
@@ -46,7 +46,7 @@ extern inline struct _reent *__getreent (void)
 {
   register char *ret;
 #ifdef __x86_64__
-  __asm __volatile__ ("movq %%gs:8,%0" : "=r" (ret));
+  __asm __volatile__ ("movl cygtls_slot(%%rip),%%r10d\nmovq %%gs:0x1480(,%%r10d,8),%0" : "=r" (ret) : : "r10");
 #else
   __asm __volatile__ ("movl %%fs:4,%0" : "=r" (ret));
 #endif
diff --git a/winsup/cygwin/init.cc b/winsup/cygwin/init.cc
index 7787b164c..82b01b7aa 100644
--- a/winsup/cygwin/init.cc
+++ b/winsup/cygwin/init.cc
@@ -11,7 +11,7 @@ details. */
 #include "ntdll.h"
 #include "shared_info.h"
 
-static DWORD _my_oldfunc;
+DWORD NO_COPY cygtls_slot;
 
 static char *search_for  = (char *) cygthread::stub;
 unsigned threadfunc_ix[8];
@@ -25,7 +25,9 @@ __attribute__ ((force_align_arg_pointer))
 static void WINAPI
 threadfunc_fe (VOID *arg)
 {
-  _cygtls::call ((DWORD (*)  (void *, void *)) TlsGetValue (_my_oldfunc), arg);
+  PVOID f = TlsGetValue (cygtls_slot);
+  _set_tls();
+  _cygtls::call ((DWORD (*)  (void *, void *)) f, arg);
 }
 
 /* If possible, redirect the thread entry point to a cygwin routine which
@@ -62,7 +64,7 @@ munge_threadfunc ()
 	  for (i = 0; threadfunc_ix[i]; i++)
 	    if (!threadfunc || ebp[threadfunc_ix[i]] == threadfunc)
 	       ebp[threadfunc_ix[i]] = (char *) threadfunc_fe;
-	  TlsSetValue (_my_oldfunc, threadfunc);
+	  TlsSetValue (cygtls_slot, threadfunc);
 	}
     }
 }
@@ -81,6 +83,8 @@ dll_entry (HANDLE h, DWORD reason, void *static_load)
   switch (reason)
     {
     case DLL_PROCESS_ATTACH:
+      cygtls_slot = TlsAlloc ();
+      _set_tls();
       init_console_handler (false);
 
       cygwin_hmodule = (HMODULE) h;
@@ -97,7 +101,6 @@ dll_entry (HANDLE h, DWORD reason, void *static_load)
       memcpy (_REENT, _GLOBAL_REENT, sizeof (struct _reent));
 
       dll_crt0_0 ();
-      _my_oldfunc = TlsAlloc ();
       dll_finished_loading = true;
       break;
     case DLL_PROCESS_DETACH:
@@ -105,6 +108,7 @@ dll_entry (HANDLE h, DWORD reason, void *static_load)
 	shared_destroy ();
       break;
     case DLL_THREAD_ATTACH:
+      _set_tls();
       if (dll_finished_loading)
 	munge_threadfunc ();
       break;
diff --git a/winsup/cygwin/miscfuncs.cc b/winsup/cygwin/miscfuncs.cc
index cba2140b6..090eb4cca 100644
--- a/winsup/cygwin/miscfuncs.cc
+++ b/winsup/cygwin/miscfuncs.cc
@@ -378,6 +378,7 @@ pthread_wrapper (PVOID arg)
   /* Set stack values in TEB */
   PTEB teb = NtCurrentTeb ();
   teb->Tib.StackBase = wrapper_arg.stackbase;
+  _set_tls();
   teb->Tib.StackLimit = wrapper_arg.stacklimit ?: wrapper_arg.stackaddr;
   /* Set DeallocationStack value.  If we have an application-provided stack,
      we set DeallocationStack to NULL, so NtTerminateThread does not deallocate
@@ -647,6 +648,7 @@ create_new_main_thread_stack (PVOID &allocationbase, SIZE_T parent_commitsize)
     return NULL;
   NtCurrentTeb()->Tib.StackBase = ((PBYTE) allocationbase + stacksize);
   NtCurrentTeb()->Tib.StackLimit = stacklimit;
+  _set_tls();
   return ((PBYTE) allocationbase + stacksize - 16);
 }
 #endif
-- 
2.28.0

============================================================

This still needs work obviously:

- _set_tls() calls are scattered a little haphazardly
- the asm will only work if TlsAlloc gets one of the first 64 TLS slots
- I'm concerned about cygtls_slot being used before init (after fork?)

Anyone have any thoughts on this?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-22 15:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 15:13 Fibers and cygtls David McFarland
2020-09-04 19:51 ` Corinna Vinschen
2020-09-05  2:46   ` David McFarland
2021-12-20 15:01     ` David McFarland
2021-12-22 15:46       ` David McFarland
2020-09-06 14:19 ` Jon Turney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).