From: David McFarland <corngood@gmail.com>
To: cygwin-developers@cygwin.com
Subject: Fibers and cygtls
Date: Fri, 04 Sep 2020 12:13:27 -0300 [thread overview]
Message-ID: <vrit4koda82w.fsf@gmail.com> (raw)
For a while now I've been trying to get nix working on Cygwin. It
compiles fine, but when it's run, it will exit unexpectedly in a way
that doesn't return an error code to Cygwin, and doesn't break in gdb.
I tracked this to its use of boost coroutines, which use boost-context
to switch fibers. boost-context can be built using either a custom asm
implementation, or using windows fibers (CreateFiber).
Both the boost-context asm implementation and win32 fibers switch the
NT_TIB StackBase and StackLimit fields (to a user provided buffer) when
switching fibers. StackBase is used to find the cygtls for the current
thread (in _my_tls, and with asm in gendef etc). Because of this, A
program will crash when any syscall is made from a fiber.
I searched for any discussion about this, and couldn't find much.
There's this reference to fibers being broken on Cygwin:
https://schneide.blog/2016/09/19/c-coroutines-on-windows-with-the-fiber-api/
This is a test program which shows the problem using win32 fibers:
===File test.c==============
#include <windows.h>
#include <stdio.h>
#include <unistd.h>
DWORD tls_slot;
#define LOG(x) fprintf(stderr, x " %p %u %u %p %p\n", \
GetCurrentFiber(), \
(unsigned)GetCurrentThreadId(), \
(unsigned)tls_slot, \
TlsGetValue(tls_slot), \
NtCurrentTeb());
VOID WINAPI proc(LPVOID parent) {
LOG("b");
SwitchToFiber(parent);
}
int main() {
tls_slot = TlsAlloc();
TlsSetValue(tls_slot, (void*)0x123);
LOG("a");
LPVOID self = ConvertThreadToFiber(0);
LPVOID fiber = CreateFiber(16 * 1024, &proc, self);
SwitchToFiber(fiber);
LOG("c");
return 0;
}
============================================================
My first idea was to find an unused spot in the TEB to store the cygtls
pointer. This worked, but I couldn't find a field (e.g. user reserved,
or 'spare' bytes), that was safe from being stomped by win32 libs.
The only safe implementation I could think of was to use the win32 TLS
APIs to store the pointer. TlsAlloc is already used in dll_entry to
create a slot for thread entry points. I decided to try reusing that
slot to store the cygtls pointer.
This is my WIP implementation:
===File 0001-Cygwin-store-tls-pointer-in-win32-tls.patch===
From 7a61b556575f03e3e99116dc355aa5999b28c855 Mon Sep 17 00:00:00 2001
From: David McFarland <corngood@gmail.com>
Date: Fri, 4 Sep 2020 10:15:57 -0300
Subject: [PATCH] Cygwin: store tls pointer in win32 tls
---
winsup/cygwin/cygtls.cc | 15 +++++++++++++++
winsup/cygwin/cygtls.h | 6 +++++-
winsup/cygwin/dcrt0.cc | 1 +
winsup/cygwin/fork.cc | 1 +
winsup/cygwin/gendef | 15 ++++++++++-----
winsup/cygwin/include/cygwin/config.h | 2 +-
winsup/cygwin/init.cc | 12 ++++++++----
winsup/cygwin/miscfuncs.cc | 2 ++
8 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/winsup/cygwin/cygtls.cc b/winsup/cygwin/cygtls.cc
index 1a2213d1f..806f6516f 100644
--- a/winsup/cygwin/cygtls.cc
+++ b/winsup/cygwin/cygtls.cc
@@ -16,6 +16,20 @@ details. */
#include "sigproc.h"
#include "exception.h"
+extern DWORD cygtls_slot;
+
+void _set_tls(TEB *teb)
+{
+ TlsSetValue(cygtls_slot, teb->Tib.StackBase);
+}
+
+_cygtls* _current_tls()
+{
+ register void *ret;
+ __asm __volatile__ ("movl cygtls_slot(%%rip),%%r10d\nmovq %%gs:0x1480(,%%r10d,8),%0" : "=r" (ret) : : "r10");
+ return (_cygtls *) ((PBYTE) ret - CYGTLS_PADSIZE);
+}
+
/* Two calls to get the stack right... */
void
_cygtls::call (DWORD (*func) (void *, void *), void *arg)
@@ -24,6 +38,7 @@ _cygtls::call (DWORD (*func) (void *, void *), void *arg)
/* Initialize this thread's ability to respond to things like
SIGSEGV or SIGFPE. */
exception protect;
+ _set_tls();
_my_tls.call2 (func, arg, buf);
}
diff --git a/winsup/cygwin/cygtls.h b/winsup/cygwin/cygtls.h
index a2e3676fc..c5165818f 100644
--- a/winsup/cygwin/cygtls.h
+++ b/winsup/cygwin/cygtls.h
@@ -286,7 +286,11 @@ private:
#include "cygerrno.h"
#include "ntdll.h"
-#define _my_tls (*((_cygtls *) ((PBYTE) NtCurrentTeb()->Tib.StackBase - CYGTLS_PADSIZE)))
+void _set_tls(TEB*);
+inline void _set_tls() { _set_tls(NtCurrentTeb()); }
+_cygtls* _current_tls();
+
+#define _my_tls (*_current_tls())
extern _cygtls *_main_tls;
extern _cygtls *_sig_tls;
diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
index 810017956..634404573 100644
--- a/winsup/cygwin/dcrt0.cc
+++ b/winsup/cygwin/dcrt0.cc
@@ -475,6 +475,7 @@ child_info_fork::alloc_stack ()
StackBase in the child to be the same as in the parent, so that the
computation of _my_tls is correct. */
teb->Tib.StackBase = (PVOID) stackbase;
+ _set_tls(teb);
}
}
diff --git a/winsup/cygwin/fork.cc b/winsup/cygwin/fork.cc
index 7c07b062e..d5cb0fff3 100644
--- a/winsup/cygwin/fork.cc
+++ b/winsup/cygwin/fork.cc
@@ -171,6 +171,7 @@ frok::child (volatile char * volatile here)
myself->pid, myself->ppid, __builtin_frame_address (0));
sigproc_printf ("hParent %p, load_dlls %d", hParent, load_dlls);
+ _set_tls();
/* Make sure threadinfo information is properly set up. */
if (&_my_tls != _main_tls)
{
diff --git a/winsup/cygwin/gendef b/winsup/cygwin/gendef
index 431ec67d3..07bf8e200 100755
--- a/winsup/cygwin/gendef
+++ b/winsup/cygwin/gendef
@@ -130,7 +130,8 @@ EOF
.seh_proc _sigfe_maybe
_sigfe_maybe: # stack is aligned on entry!
.seh_endprologue
- movq %gs:8,%r10 # location of bottom of stack
+ movl cygtls_slot(%rip),%r10d
+ movq %gs:0x1480(,%r10d,8),%r10 # location of bottom of stack
leaq $tls::initialized(%r10),%r11 # where we will be looking
cmpq %r11,%rsp # stack loc > than tls
jge 0f # yep. we don't have a tls.
@@ -143,7 +144,8 @@ _sigfe_maybe: # stack is aligned on entry!
.seh_proc _sigfe
_sigfe: # stack is aligned on entry!
.seh_endprologue
- movq %gs:8,%r10 # location of bottom of stack
+ movl cygtls_slot(%rip),%r10d
+ movq %gs:0x1480(,%r10d,8),%r10 # location of bottom of stack
1: movl \$1,%r11d # potential lock value
xchgl %r11d,$tls::stacklock(%r10) # see if we can grab it
movl %r11d,$tls::spinning(%r10) # flag if we are waiting for lock
@@ -167,7 +169,8 @@ _sigfe: # stack is aligned on entry!
_sigbe: # return here after cygwin syscall
# stack is aligned on entry!
.seh_endprologue
- movq %gs:8,%r10 # address of bottom of tls
+ movl cygtls_slot(%rip),%r10d
+ movq %gs:0x1480(,%r10d,8),%r10 # location of bottom of stack
1: movl \$1,%r11d # potential lock value
xchgl %r11d,$tls::stacklock(%r10) # see if we can grab it
movl %r11d,$tls::spinning(%r10) # flag if we are waiting for lock
@@ -247,7 +250,8 @@ sigdelayed:
movdqa %xmm0,0x20(%rsp)
.seh_endprologue
- movq %gs:8,%r12 # get tls
+ movl cygtls_slot(%rip),%r12d
+ movq %gs:0x1480(,%r12d,8),%r12 # location of bottom of stack
movl $tls::saved_errno(%r12),%r15d # temporarily save saved_errno
movq \$$tls::start_offset,%rcx # point to beginning of tls block
addq %r12,%rcx # and store as first arg to method
@@ -368,7 +372,8 @@ stabilize_sig_stack:
subq \$0x20,%rsp
.seh_stackalloc 32
.seh_endprologue
- movq %gs:8,%r12
+ movl cygtls_slot(%rip),%r12d
+ movq %gs:0x1480(,%r12d,8),%r12 # location of bottom of stack
1: movl \$1,%r10d
xchgl %r10d,$tls::stacklock(%r12)
movl %r10d,$tls::spinning(%r12) # flag if we are waiting for lock
diff --git a/winsup/cygwin/include/cygwin/config.h b/winsup/cygwin/include/cygwin/config.h
index 282637b67..498d6fdc6 100644
--- a/winsup/cygwin/include/cygwin/config.h
+++ b/winsup/cygwin/include/cygwin/config.h
@@ -46,7 +46,7 @@ extern inline struct _reent *__getreent (void)
{
register char *ret;
#ifdef __x86_64__
- __asm __volatile__ ("movq %%gs:8,%0" : "=r" (ret));
+ __asm __volatile__ ("movl cygtls_slot(%%rip),%%r10d\nmovq %%gs:0x1480(,%%r10d,8),%0" : "=r" (ret) : : "r10");
#else
__asm __volatile__ ("movl %%fs:4,%0" : "=r" (ret));
#endif
diff --git a/winsup/cygwin/init.cc b/winsup/cygwin/init.cc
index 7787b164c..82b01b7aa 100644
--- a/winsup/cygwin/init.cc
+++ b/winsup/cygwin/init.cc
@@ -11,7 +11,7 @@ details. */
#include "ntdll.h"
#include "shared_info.h"
-static DWORD _my_oldfunc;
+DWORD NO_COPY cygtls_slot;
static char *search_for = (char *) cygthread::stub;
unsigned threadfunc_ix[8];
@@ -25,7 +25,9 @@ __attribute__ ((force_align_arg_pointer))
static void WINAPI
threadfunc_fe (VOID *arg)
{
- _cygtls::call ((DWORD (*) (void *, void *)) TlsGetValue (_my_oldfunc), arg);
+ PVOID f = TlsGetValue (cygtls_slot);
+ _set_tls();
+ _cygtls::call ((DWORD (*) (void *, void *)) f, arg);
}
/* If possible, redirect the thread entry point to a cygwin routine which
@@ -62,7 +64,7 @@ munge_threadfunc ()
for (i = 0; threadfunc_ix[i]; i++)
if (!threadfunc || ebp[threadfunc_ix[i]] == threadfunc)
ebp[threadfunc_ix[i]] = (char *) threadfunc_fe;
- TlsSetValue (_my_oldfunc, threadfunc);
+ TlsSetValue (cygtls_slot, threadfunc);
}
}
}
@@ -81,6 +83,8 @@ dll_entry (HANDLE h, DWORD reason, void *static_load)
switch (reason)
{
case DLL_PROCESS_ATTACH:
+ cygtls_slot = TlsAlloc ();
+ _set_tls();
init_console_handler (false);
cygwin_hmodule = (HMODULE) h;
@@ -97,7 +101,6 @@ dll_entry (HANDLE h, DWORD reason, void *static_load)
memcpy (_REENT, _GLOBAL_REENT, sizeof (struct _reent));
dll_crt0_0 ();
- _my_oldfunc = TlsAlloc ();
dll_finished_loading = true;
break;
case DLL_PROCESS_DETACH:
@@ -105,6 +108,7 @@ dll_entry (HANDLE h, DWORD reason, void *static_load)
shared_destroy ();
break;
case DLL_THREAD_ATTACH:
+ _set_tls();
if (dll_finished_loading)
munge_threadfunc ();
break;
diff --git a/winsup/cygwin/miscfuncs.cc b/winsup/cygwin/miscfuncs.cc
index cba2140b6..090eb4cca 100644
--- a/winsup/cygwin/miscfuncs.cc
+++ b/winsup/cygwin/miscfuncs.cc
@@ -378,6 +378,7 @@ pthread_wrapper (PVOID arg)
/* Set stack values in TEB */
PTEB teb = NtCurrentTeb ();
teb->Tib.StackBase = wrapper_arg.stackbase;
+ _set_tls();
teb->Tib.StackLimit = wrapper_arg.stacklimit ?: wrapper_arg.stackaddr;
/* Set DeallocationStack value. If we have an application-provided stack,
we set DeallocationStack to NULL, so NtTerminateThread does not deallocate
@@ -647,6 +648,7 @@ create_new_main_thread_stack (PVOID &allocationbase, SIZE_T parent_commitsize)
return NULL;
NtCurrentTeb()->Tib.StackBase = ((PBYTE) allocationbase + stacksize);
NtCurrentTeb()->Tib.StackLimit = stacklimit;
+ _set_tls();
return ((PBYTE) allocationbase + stacksize - 16);
}
#endif
--
2.28.0
============================================================
This still needs work obviously:
- _set_tls() calls are scattered a little haphazardly
- the asm will only work if TlsAlloc gets one of the first 64 TLS slots
- I'm concerned about cygtls_slot being used before init (after fork?)
Anyone have any thoughts on this?
next reply other threads:[~2020-09-04 15:13 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-04 15:13 David McFarland [this message]
2020-09-04 19:51 ` Corinna Vinschen
2020-09-05 2:46 ` David McFarland
2021-12-20 15:01 ` David McFarland
2021-12-22 15:46 ` David McFarland
2020-09-06 14:19 ` Jon Turney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=vrit4koda82w.fsf@gmail.com \
--to=corngood@gmail.com \
--cc=cygwin-developers@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).