From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by sourceware.org (Postfix) with ESMTPS id 8A6F83857836 for ; Fri, 4 Sep 2020 15:13:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8A6F83857836 Received: by mail-qt1-x842.google.com with SMTP id b3so4797971qtg.13 for ; Fri, 04 Sep 2020 08:13:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=ViIhe4pR90qUoPXUucf5vTWBO1kw5jMiGsSqCIz6nZM=; b=Fmg7IPtQdrSJQ/E72PFBStlyYRKfbToY30ykwyJuftx6YaWmdQyPWRBUIHSlHn2OS2 FjIllWLA9/ujmS2k9OCjsk/EUv/twiGw8hHIfTSBBY/uXGqtzUFCOrJR42oD1LEF8y8B 0A5qzzEkNWU3ZWv8voPVZJWRPMW+Gw7HgLDBwIPsKwrwWyNh5tDBxQ+1gbIcHQGhUUmw Q7O4ri0z52dt/u/4AGVNZRy5Em9HMwYOeM6Egfpyv/PcOpV4AzZgH2PXMacakmgqrJjo 7i6MfkmWAU1FkLFbWnF3/rQ+koYV78cd4VQ1vBRXjSGfQr01XLhNvKU9HdXvjmbntHuL fSZw== X-Gm-Message-State: AOAM530gmm5wvtQvuJAcCJ4h1qqv0aEWu7H+eqpaLxtZpBFcitKLZ2Sc W5ZSip02O8vf+z5FU7zcwwV9GKffVXo= X-Google-Smtp-Source: ABdhPJzNthPoZ9emmwxY1ymVAtjD6KpeZf4PMG2HGmiGc707Ya97lPoRcz837PhV6QaaQgTMnnW9Dg== X-Received: by 2002:ac8:5351:: with SMTP id d17mr1781981qto.267.1599232408906; Fri, 04 Sep 2020 08:13:28 -0700 (PDT) Received: from davidm-laptop (host-24-138-77-139.public.eastlink.ca. [24.138.77.139]) by smtp.gmail.com with ESMTPSA id w194sm4653668qkb.130.2020.09.04.08.13.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Sep 2020 08:13:28 -0700 (PDT) From: David McFarland To: cygwin-developers@cygwin.com Subject: Fibers and cygtls Date: Fri, 04 Sep 2020 12:13:27 -0300 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Sep 2020 15:13:31 -0000 For a while now I've been trying to get nix working on Cygwin. It compiles fine, but when it's run, it will exit unexpectedly in a way that doesn't return an error code to Cygwin, and doesn't break in gdb. I tracked this to its use of boost coroutines, which use boost-context to switch fibers. boost-context can be built using either a custom asm implementation, or using windows fibers (CreateFiber). Both the boost-context asm implementation and win32 fibers switch the NT_TIB StackBase and StackLimit fields (to a user provided buffer) when switching fibers. StackBase is used to find the cygtls for the current thread (in _my_tls, and with asm in gendef etc). Because of this, A program will crash when any syscall is made from a fiber. I searched for any discussion about this, and couldn't find much. There's this reference to fibers being broken on Cygwin: https://schneide.blog/2016/09/19/c-coroutines-on-windows-with-the-fiber-api/ This is a test program which shows the problem using win32 fibers: ===File test.c============== #include #include #include DWORD tls_slot; #define LOG(x) fprintf(stderr, x " %p %u %u %p %p\n", \ GetCurrentFiber(), \ (unsigned)GetCurrentThreadId(), \ (unsigned)tls_slot, \ TlsGetValue(tls_slot), \ NtCurrentTeb()); VOID WINAPI proc(LPVOID parent) { LOG("b"); SwitchToFiber(parent); } int main() { tls_slot = TlsAlloc(); TlsSetValue(tls_slot, (void*)0x123); LOG("a"); LPVOID self = ConvertThreadToFiber(0); LPVOID fiber = CreateFiber(16 * 1024, &proc, self); SwitchToFiber(fiber); LOG("c"); return 0; } ============================================================ My first idea was to find an unused spot in the TEB to store the cygtls pointer. This worked, but I couldn't find a field (e.g. user reserved, or 'spare' bytes), that was safe from being stomped by win32 libs. The only safe implementation I could think of was to use the win32 TLS APIs to store the pointer. TlsAlloc is already used in dll_entry to create a slot for thread entry points. I decided to try reusing that slot to store the cygtls pointer. This is my WIP implementation: ===File 0001-Cygwin-store-tls-pointer-in-win32-tls.patch=== >From 7a61b556575f03e3e99116dc355aa5999b28c855 Mon Sep 17 00:00:00 2001 From: David McFarland Date: Fri, 4 Sep 2020 10:15:57 -0300 Subject: [PATCH] Cygwin: store tls pointer in win32 tls --- winsup/cygwin/cygtls.cc | 15 +++++++++++++++ winsup/cygwin/cygtls.h | 6 +++++- winsup/cygwin/dcrt0.cc | 1 + winsup/cygwin/fork.cc | 1 + winsup/cygwin/gendef | 15 ++++++++++----- winsup/cygwin/include/cygwin/config.h | 2 +- winsup/cygwin/init.cc | 12 ++++++++---- winsup/cygwin/miscfuncs.cc | 2 ++ 8 files changed, 43 insertions(+), 11 deletions(-) diff --git a/winsup/cygwin/cygtls.cc b/winsup/cygwin/cygtls.cc index 1a2213d1f..806f6516f 100644 --- a/winsup/cygwin/cygtls.cc +++ b/winsup/cygwin/cygtls.cc @@ -16,6 +16,20 @@ details. */ #include "sigproc.h" #include "exception.h" +extern DWORD cygtls_slot; + +void _set_tls(TEB *teb) +{ + TlsSetValue(cygtls_slot, teb->Tib.StackBase); +} + +_cygtls* _current_tls() +{ + register void *ret; + __asm __volatile__ ("movl cygtls_slot(%%rip),%%r10d\nmovq %%gs:0x1480(,%%r10d,8),%0" : "=r" (ret) : : "r10"); + return (_cygtls *) ((PBYTE) ret - CYGTLS_PADSIZE); +} + /* Two calls to get the stack right... */ void _cygtls::call (DWORD (*func) (void *, void *), void *arg) @@ -24,6 +38,7 @@ _cygtls::call (DWORD (*func) (void *, void *), void *arg) /* Initialize this thread's ability to respond to things like SIGSEGV or SIGFPE. */ exception protect; + _set_tls(); _my_tls.call2 (func, arg, buf); } diff --git a/winsup/cygwin/cygtls.h b/winsup/cygwin/cygtls.h index a2e3676fc..c5165818f 100644 --- a/winsup/cygwin/cygtls.h +++ b/winsup/cygwin/cygtls.h @@ -286,7 +286,11 @@ private: #include "cygerrno.h" #include "ntdll.h" -#define _my_tls (*((_cygtls *) ((PBYTE) NtCurrentTeb()->Tib.StackBase - CYGTLS_PADSIZE))) +void _set_tls(TEB*); +inline void _set_tls() { _set_tls(NtCurrentTeb()); } +_cygtls* _current_tls(); + +#define _my_tls (*_current_tls()) extern _cygtls *_main_tls; extern _cygtls *_sig_tls; diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc index 810017956..634404573 100644 --- a/winsup/cygwin/dcrt0.cc +++ b/winsup/cygwin/dcrt0.cc @@ -475,6 +475,7 @@ child_info_fork::alloc_stack () StackBase in the child to be the same as in the parent, so that the computation of _my_tls is correct. */ teb->Tib.StackBase = (PVOID) stackbase; + _set_tls(teb); } } diff --git a/winsup/cygwin/fork.cc b/winsup/cygwin/fork.cc index 7c07b062e..d5cb0fff3 100644 --- a/winsup/cygwin/fork.cc +++ b/winsup/cygwin/fork.cc @@ -171,6 +171,7 @@ frok::child (volatile char * volatile here) myself->pid, myself->ppid, __builtin_frame_address (0)); sigproc_printf ("hParent %p, load_dlls %d", hParent, load_dlls); + _set_tls(); /* Make sure threadinfo information is properly set up. */ if (&_my_tls != _main_tls) { diff --git a/winsup/cygwin/gendef b/winsup/cygwin/gendef index 431ec67d3..07bf8e200 100755 --- a/winsup/cygwin/gendef +++ b/winsup/cygwin/gendef @@ -130,7 +130,8 @@ EOF .seh_proc _sigfe_maybe _sigfe_maybe: # stack is aligned on entry! .seh_endprologue - movq %gs:8,%r10 # location of bottom of stack + movl cygtls_slot(%rip),%r10d + movq %gs:0x1480(,%r10d,8),%r10 # location of bottom of stack leaq $tls::initialized(%r10),%r11 # where we will be looking cmpq %r11,%rsp # stack loc > than tls jge 0f # yep. we don't have a tls. @@ -143,7 +144,8 @@ _sigfe_maybe: # stack is aligned on entry! .seh_proc _sigfe _sigfe: # stack is aligned on entry! .seh_endprologue - movq %gs:8,%r10 # location of bottom of stack + movl cygtls_slot(%rip),%r10d + movq %gs:0x1480(,%r10d,8),%r10 # location of bottom of stack 1: movl \$1,%r11d # potential lock value xchgl %r11d,$tls::stacklock(%r10) # see if we can grab it movl %r11d,$tls::spinning(%r10) # flag if we are waiting for lock @@ -167,7 +169,8 @@ _sigfe: # stack is aligned on entry! _sigbe: # return here after cygwin syscall # stack is aligned on entry! .seh_endprologue - movq %gs:8,%r10 # address of bottom of tls + movl cygtls_slot(%rip),%r10d + movq %gs:0x1480(,%r10d,8),%r10 # location of bottom of stack 1: movl \$1,%r11d # potential lock value xchgl %r11d,$tls::stacklock(%r10) # see if we can grab it movl %r11d,$tls::spinning(%r10) # flag if we are waiting for lock @@ -247,7 +250,8 @@ sigdelayed: movdqa %xmm0,0x20(%rsp) .seh_endprologue - movq %gs:8,%r12 # get tls + movl cygtls_slot(%rip),%r12d + movq %gs:0x1480(,%r12d,8),%r12 # location of bottom of stack movl $tls::saved_errno(%r12),%r15d # temporarily save saved_errno movq \$$tls::start_offset,%rcx # point to beginning of tls block addq %r12,%rcx # and store as first arg to method @@ -368,7 +372,8 @@ stabilize_sig_stack: subq \$0x20,%rsp .seh_stackalloc 32 .seh_endprologue - movq %gs:8,%r12 + movl cygtls_slot(%rip),%r12d + movq %gs:0x1480(,%r12d,8),%r12 # location of bottom of stack 1: movl \$1,%r10d xchgl %r10d,$tls::stacklock(%r12) movl %r10d,$tls::spinning(%r12) # flag if we are waiting for lock diff --git a/winsup/cygwin/include/cygwin/config.h b/winsup/cygwin/include/cygwin/config.h index 282637b67..498d6fdc6 100644 --- a/winsup/cygwin/include/cygwin/config.h +++ b/winsup/cygwin/include/cygwin/config.h @@ -46,7 +46,7 @@ extern inline struct _reent *__getreent (void) { register char *ret; #ifdef __x86_64__ - __asm __volatile__ ("movq %%gs:8,%0" : "=r" (ret)); + __asm __volatile__ ("movl cygtls_slot(%%rip),%%r10d\nmovq %%gs:0x1480(,%%r10d,8),%0" : "=r" (ret) : : "r10"); #else __asm __volatile__ ("movl %%fs:4,%0" : "=r" (ret)); #endif diff --git a/winsup/cygwin/init.cc b/winsup/cygwin/init.cc index 7787b164c..82b01b7aa 100644 --- a/winsup/cygwin/init.cc +++ b/winsup/cygwin/init.cc @@ -11,7 +11,7 @@ details. */ #include "ntdll.h" #include "shared_info.h" -static DWORD _my_oldfunc; +DWORD NO_COPY cygtls_slot; static char *search_for = (char *) cygthread::stub; unsigned threadfunc_ix[8]; @@ -25,7 +25,9 @@ __attribute__ ((force_align_arg_pointer)) static void WINAPI threadfunc_fe (VOID *arg) { - _cygtls::call ((DWORD (*) (void *, void *)) TlsGetValue (_my_oldfunc), arg); + PVOID f = TlsGetValue (cygtls_slot); + _set_tls(); + _cygtls::call ((DWORD (*) (void *, void *)) f, arg); } /* If possible, redirect the thread entry point to a cygwin routine which @@ -62,7 +64,7 @@ munge_threadfunc () for (i = 0; threadfunc_ix[i]; i++) if (!threadfunc || ebp[threadfunc_ix[i]] == threadfunc) ebp[threadfunc_ix[i]] = (char *) threadfunc_fe; - TlsSetValue (_my_oldfunc, threadfunc); + TlsSetValue (cygtls_slot, threadfunc); } } } @@ -81,6 +83,8 @@ dll_entry (HANDLE h, DWORD reason, void *static_load) switch (reason) { case DLL_PROCESS_ATTACH: + cygtls_slot = TlsAlloc (); + _set_tls(); init_console_handler (false); cygwin_hmodule = (HMODULE) h; @@ -97,7 +101,6 @@ dll_entry (HANDLE h, DWORD reason, void *static_load) memcpy (_REENT, _GLOBAL_REENT, sizeof (struct _reent)); dll_crt0_0 (); - _my_oldfunc = TlsAlloc (); dll_finished_loading = true; break; case DLL_PROCESS_DETACH: @@ -105,6 +108,7 @@ dll_entry (HANDLE h, DWORD reason, void *static_load) shared_destroy (); break; case DLL_THREAD_ATTACH: + _set_tls(); if (dll_finished_loading) munge_threadfunc (); break; diff --git a/winsup/cygwin/miscfuncs.cc b/winsup/cygwin/miscfuncs.cc index cba2140b6..090eb4cca 100644 --- a/winsup/cygwin/miscfuncs.cc +++ b/winsup/cygwin/miscfuncs.cc @@ -378,6 +378,7 @@ pthread_wrapper (PVOID arg) /* Set stack values in TEB */ PTEB teb = NtCurrentTeb (); teb->Tib.StackBase = wrapper_arg.stackbase; + _set_tls(); teb->Tib.StackLimit = wrapper_arg.stacklimit ?: wrapper_arg.stackaddr; /* Set DeallocationStack value. If we have an application-provided stack, we set DeallocationStack to NULL, so NtTerminateThread does not deallocate @@ -647,6 +648,7 @@ create_new_main_thread_stack (PVOID &allocationbase, SIZE_T parent_commitsize) return NULL; NtCurrentTeb()->Tib.StackBase = ((PBYTE) allocationbase + stacksize); NtCurrentTeb()->Tib.StackLimit = stacklimit; + _set_tls(); return ((PBYTE) allocationbase + stacksize - 16); } #endif -- 2.28.0 ============================================================ This still needs work obviously: - _set_tls() calls are scattered a little haphazardly - the asm will only work if TlsAlloc gets one of the first 64 TLS slots - I'm concerned about cygtls_slot being used before init (after fork?) Anyone have any thoughts on this?