public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
* Improvements to fork handling (1/5)
@ 2011-05-11 18:31 Ryan Johnson
  2011-05-17 11:14 ` Ryan Johnson
  2011-05-22  1:41 ` Christopher Faylor
  0 siblings, 2 replies; 8+ messages in thread
From: Ryan Johnson @ 2011-05-11 18:31 UTC (permalink / raw)
  To: cygwin-patches

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

Hi all,

This is the first of a series of patches, sent in separate emails as 
requested.

The first patch allows a child which failed due to address space 
clobbers to report cleanly back to the parent. As a result, DLL_LINK 
which land wrong, DLL_LOAD whose space gets clobbered, and failure to 
replicate the cygheap, generate retries and dispense with the terminal 
spam. Handling of unexpected errors should not have changed. Further, 
the patch fixes several sources of access violations and crashes, 
including:
- accessing invalid state after failing to notice that a 
statically-linked dll loaded at the wrong location
- accessing invalid state while running dtors on a failed forkee. I 
follow cgf's approach of simply not running any dtors, based on the 
observation that dlls in the parent (gcc_s!) can store state about other 
dlls and crash trying to access that state in the child, even if they 
appeared to map properly in both processes.
- attempting to generate a stack trace when somebody in the call chain 
used alloca(). This one is only sidestepped here, because we eliminate 
the access violations and api_fatal calls which would have triggered the 
problematic stack traces. I have a separate patch which allows offending 
functions to disable stack traces, if folks are interested, but it was 
kind of noisy so I left it out for now (cygwin uses alloca pretty 
liberally!).

Ryan

[-- Attachment #2: fork-clean-exit.patch --]
[-- Type: text/plain, Size: 5013 bytes --]

diff --git a/child_info.h b/child_info.h
--- a/child_info.h
+++ b/child_info.h
@@ -92,6 +92,18 @@ public:
   void alloc_stack_hard_way (volatile char *);
 };
 
+/* Several well-known problems can prevent us from patching up a
+   forkee; when such errors arise the child should exit cleanly (with
+   a failure code for the parent) rather than dumping stack.  */
+#define fork_api_fatal(fmt, args...)					\
+  do									\
+    {									\
+      sigproc_printf (fmt,## args);					\
+      fork_info->handle_failure (-1);					\
+    }									\
+  while(0)
+    
+
 class fhandler_base;
 
 class cygheap_exec_info
diff --git a/dll_init.cc b/dll_init.cc
--- a/dll_init.cc
+++ b/dll_init.cc
@@ -19,6 +19,7 @@ details. */
 #include "dtable.h"
 #include "cygheap.h"
 #include "pinfo.h"
+#include "child_info.h"
 #include "cygtls.h"
 #include "exception.h"
 #include <wchar.h>
@@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
     {
       if (!in_forkee)
 	d->count++;	/* Yes.  Bump the usage count. */
+      else if (d->handle != h)
+	fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
+			d->name, d->handle, h);
       d->p = p;
     }
   else
     {
+      if (in_forkee)
+	system_printf ("Unexpected dll loaded during fork: %W", name);
+      
       /* FIXME: Change this to new at some point. */
       d = (dll *) cmalloc (HEAP_2_DLL, sizeof (*d) + (namelen * sizeof (*name)));
 
@@ -371,7 +378,6 @@ dll_list::load_after_fork (HANDLE parent
             preferred_block = reserve_at (d->name, (DWORD) h);
 
 	}
-  in_forkee = false;
 }
 
 struct dllcrt0_info
diff --git a/dll_init.h b/dll_init.h
--- a/dll_init.h
+++ b/dll_init.h
@@ -57,7 +57,7 @@ struct dll
   int init ();
   void run_dtors ()
   {
-    if (has_dtors)
+    if (has_dtors && !in_forkee)
       {
 	has_dtors = 0;
 	p.run_dtors ();
diff --git a/fork.cc b/fork.cc
--- a/fork.cc
+++ b/fork.cc
@@ -233,6 +233,7 @@ frok::child (volatile char * volatile he
       sync_with_parent ("loaded dlls", true);
     }
 
+  in_forkee = false;
   init_console_handler (myself->ctty >= 0);
   ForceCloseHandle1 (fork_info->forker_finished, forker_finished);
 
@@ -393,10 +394,13 @@ frok::parent (volatile char * volatile s
 	  if (!exit_code)
 	    continue;
 	  this_errno = EAGAIN;
-	  /* Not thread safe, but do we care? */
-	  __small_sprintf (errbuf, "died waiting for longjmp before initialization, "
-			   "retry %d, exit code %p", ch.retry, exit_code);
-	  error = errbuf;
+	  if (exit_code != EXITCODE_FORK_FAILED)
+	    {
+	      /* Not thread safe, but do we care? */
+	      __small_sprintf (errbuf, "died waiting for longjmp before initialization, "
+			       "retry %d, exit code %p", ch.retry, exit_code);
+	      error = errbuf;
+	    }
 	  goto cleanup;
 	}
       break;
@@ -515,7 +519,8 @@ frok::parent (volatile char * volatile s
   if (!ch.sync (child->pid, pi.hProcess, FORK_WAIT_TIMEOUT))
     {
       this_errno = EAGAIN;
-      error = "died waiting for dll loading";
+      if (ch.exit_code != EXITCODE_FORK_FAILED)
+	  error = "died waiting for dll loading";
       goto cleanup;
     }
 
diff --git a/heap.cc b/heap.cc
--- a/heap.cc
+++ b/heap.cc
@@ -88,11 +88,11 @@ heap_init ()
 	  if ((reserve_size -= page_const) < allocsize)
 	    break;
 	}
-      if (!p && in_forkee && !fork_info->handle_failure (GetLastError ()))
-	api_fatal ("couldn't allocate heap, %E, base %p, top %p, "
-		   "reserve_size %d, allocsize %d, page_const %d",
-		   cygheap->user_heap.base, cygheap->user_heap.top,
-		   reserve_size, allocsize, page_const);
+      if (!p)
+	fork_api_fatal ("couldn't allocate heap, %E, base %p, top %p, "
+			"reserve_size %d, allocsize %d, page_const %d",
+			cygheap->user_heap.base, cygheap->user_heap.top,
+			reserve_size, allocsize, page_const);
       if (p != cygheap->user_heap.base)
 	api_fatal ("heap allocated at wrong address %p (mapped) != %p (expected)", p, cygheap->user_heap.base);
       if (allocsize && !VirtualAlloc (cygheap->user_heap.base, allocsize, MEM_COMMIT, PAGE_READWRITE))
diff --git a/pinfo.h b/pinfo.h
--- a/pinfo.h
+++ b/pinfo.h
@@ -36,6 +36,7 @@ enum picom
 #define EXITCODE_NOSET	0x4000000
 #define EXITCODE_RETRY	0x2000000
 #define EXITCODE_OK	0x1000000
+#define EXITCODE_FORK_FAILED 0x2200000
 
 class fhandler_pipe;
 
diff --git a/sigproc.cc b/sigproc.cc
--- a/sigproc.cc
+++ b/sigproc.cc
@@ -914,6 +914,9 @@ child_info::proc_retry (HANDLE h)
       if (retry-- > 0)
 	exit_code = 0;
       break;
+    case EXITCODE_FORK_FAILED: /* windows prevented us from forking */
+      break;
+      
     /* Count down non-recognized exit codes more quickly since they aren't
        due to known conditions.  */
     default:
@@ -932,8 +935,13 @@ child_info::proc_retry (HANDLE h)
 bool
 child_info_fork::handle_failure (DWORD err)
 {
-  if (retry > 0)
+  if (in_forkee && retry > 0)
     ExitProcess (EXITCODE_RETRY);
+  else
+    {
+      sigproc_printf ("Unable to fork.");
+      ExitProcess (EXITCODE_FORK_FAILED);
+    }
   return 0;
 }
 

[-- Attachment #3: fork-clean-exit.changes --]
[-- Type: text/plain, Size: 1120 bytes --]

        * child_info.h (fork_api_fatal): new macro for reporting to the
        parent of a forked child that the fork cannot continue.
        * dll_init.cc (dll_list::alloc): added code which aborts a fork
        attempt if DLL_LINK dlls landed at the wrong base address in the
        child.
        (dll_list::load_after_fork): no longer set in_forkee=false
        * dll_init.h (dll::run_dtors): don't do anything if
        in_forkee=true. Prevents access violations during process teardown
        when a fork attempt fails because of address space clobbers.
        * fork.cc (frok::child): Set in_forkee=false at end of fork even
        if no DLL_LOAD are present.
        (frok::parent): Detect when forkee exited due to address space
        clobbers and retry.
        * heap.cc (heap_init): use fork_api_fatal for clean exit/retry of
        failed fork.
        * pinfo.h: define new EXITCODE_FORK_FAILED to aid clean exit/retry
        of failed forks.
        * sigproc.cc (child_info::proc_retry): Deal with clean exit/retry
        of failed forkee.
        (child_info_fork::handle_failure): ditto.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-11 18:31 Improvements to fork handling (1/5) Ryan Johnson
@ 2011-05-17 11:14 ` Ryan Johnson
  2011-05-17 16:45   ` Christopher Faylor
  2011-05-22  1:41 ` Christopher Faylor
  1 sibling, 1 reply; 8+ messages in thread
From: Ryan Johnson @ 2011-05-17 11:14 UTC (permalink / raw)
  To: cygwin-patches

Any feedback on these patches?

On 11/05/2011 2:31 PM, Ryan Johnson wrote:
> Hi all,
>
> This is the first of a series of patches, sent in separate emails as 
> requested.
>
> The first patch allows a child which failed due to address space 
> clobbers to report cleanly back to the parent. As a result, DLL_LINK 
> which land wrong, DLL_LOAD whose space gets clobbered, and failure to 
> replicate the cygheap, generate retries and dispense with the terminal 
> spam. Handling of unexpected errors should not have changed. Further, 
> the patch fixes several sources of access violations and crashes, 
> including:
> - accessing invalid state after failing to notice that a 
> statically-linked dll loaded at the wrong location
> - accessing invalid state while running dtors on a failed forkee. I 
> follow cgf's approach of simply not running any dtors, based on the 
> observation that dlls in the parent (gcc_s!) can store state about 
> other dlls and crash trying to access that state in the child, even if 
> they appeared to map properly in both processes.
> - attempting to generate a stack trace when somebody in the call chain 
> used alloca(). This one is only sidestepped here, because we eliminate 
> the access violations and api_fatal calls which would have triggered 
> the problematic stack traces. I have a separate patch which allows 
> offending functions to disable stack traces, if folks are interested, 
> but it was kind of noisy so I left it out for now (cygwin uses alloca 
> pretty liberally!).
>
> Ryan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-17 11:14 ` Ryan Johnson
@ 2011-05-17 16:45   ` Christopher Faylor
  0 siblings, 0 replies; 8+ messages in thread
From: Christopher Faylor @ 2011-05-17 16:45 UTC (permalink / raw)
  To: cygwin-patches

On Tue, May 17, 2011 at 07:14:12AM -0400, Ryan Johnson wrote:
>Any feedback on these patches?

I'll get to them eventually but I have some other Cygwin stuff which is
occupying my attention currently.

cgf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-11 18:31 Improvements to fork handling (1/5) Ryan Johnson
  2011-05-17 11:14 ` Ryan Johnson
@ 2011-05-22  1:41 ` Christopher Faylor
  2011-05-22 13:05   ` Ryan Johnson
  1 sibling, 1 reply; 8+ messages in thread
From: Christopher Faylor @ 2011-05-22  1:41 UTC (permalink / raw)
  To: cygwin-patches

On Wed, May 11, 2011 at 02:31:23PM -0400, Ryan Johnson wrote:
>Hi all,
>
>This is the first of a series of patches, sent in separate emails as 
>requested.
>
>The first patch allows a child which failed due to address space 
>clobbers to report cleanly back to the parent. As a result, DLL_LINK 
>which land wrong, DLL_LOAD whose space gets clobbered, and failure to 
>replicate the cygheap, generate retries and dispense with the terminal 
>spam. Handling of unexpected errors should not have changed. Further, 
>the patch fixes several sources of access violations and crashes, 
>including:
>- accessing invalid state after failing to notice that a 
>statically-linked dll loaded at the wrong location
>- accessing invalid state while running dtors on a failed forkee. I 
>follow cgf's approach of simply not running any dtors, based on the 
>observation that dlls in the parent (gcc_s!) can store state about other 
>dlls and crash trying to access that state in the child, even if they 
>appeared to map properly in both processes.
>- attempting to generate a stack trace when somebody in the call chain 
>used alloca(). This one is only sidestepped here, because we eliminate 
>the access violations and api_fatal calls which would have triggered the 
>problematic stack traces. I have a separate patch which allows offending 
>functions to disable stack traces, if folks are interested, but it was 
>kind of noisy so I left it out for now (cygwin uses alloca pretty 
>liberally!).
>
>Ryan

>diff --git a/child_info.h b/child_info.h
>--- a/child_info.h
>+++ b/child_info.h
>@@ -92,6 +92,18 @@ public:
>   void alloc_stack_hard_way (volatile char *);
> };
> 
>+/* Several well-known problems can prevent us from patching up a
>+   forkee; when such errors arise the child should exit cleanly (with
>+   a failure code for the parent) rather than dumping stack.  */
>+#define fork_api_fatal(fmt, args...)					\
>+  do									\
>+    {									\
>+      sigproc_printf (fmt,## args);					\
>+      fork_info->handle_failure (-1);					\
>+    }									\
>+  while(0)
>+    
>+
> class fhandler_base;
> 
> class cygheap_exec_info
>diff --git a/dll_init.cc b/dll_init.cc
>--- a/dll_init.cc
>+++ b/dll_init.cc
>@@ -19,6 +19,7 @@ details. */
> #include "dtable.h"
> #include "cygheap.h"
> #include "pinfo.h"
>+#include "child_info.h"
> #include "cygtls.h"
> #include "exception.h"
> #include <wchar.h>
>@@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
>     {
>       if (!in_forkee)
> 	d->count++;	/* Yes.  Bump the usage count. */
>+      else if (d->handle != h)
>+	fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
>+			d->name, d->handle, h);

You seem to be guranteeing a failure in a condition which could conceivably work
ok for simple applications, i.e., if a dll loads in a different location that
is not necessarily going to cause a problem.

>       d->p = p;
>     }
>   else
>     {
>+      if (in_forkee)
>+	system_printf ("Unexpected dll loaded during fork: %W", name);
>+      
>       /* FIXME: Change this to new at some point. */
>       d = (dll *) cmalloc (HEAP_2_DLL, sizeof (*d) + (namelen * sizeof (*name)));
> 
>@@ -371,7 +378,6 @@ dll_list::load_after_fork (HANDLE parent
>             preferred_block = reserve_at (d->name, (DWORD) h);
> 
> 	}
>-  in_forkee = false;
> }
> 
> struct dllcrt0_info
>diff --git a/dll_init.h b/dll_init.h
>--- a/dll_init.h
>+++ b/dll_init.h
>@@ -57,7 +57,7 @@ struct dll
>   int init ();
>   void run_dtors ()
>   {
>-    if (has_dtors)
>+    if (has_dtors && !in_forkee)
>       {
> 	has_dtors = 0;
> 	p.run_dtors ();

Isn't this already handled in dll_init.cc?

>diff --git a/fork.cc b/fork.cc
>--- a/fork.cc
>+++ b/fork.cc
>@@ -233,6 +233,7 @@ frok::child (volatile char * volatile he
>       sync_with_parent ("loaded dlls", true);
>     }
> 
>+  in_forkee = false;
>   init_console_handler (myself->ctty >= 0);
>   ForceCloseHandle1 (fork_info->forker_finished, forker_finished);
> 
>@@ -393,10 +394,13 @@ frok::parent (volatile char * volatile s
> 	  if (!exit_code)
> 	    continue;
> 	  this_errno = EAGAIN;
>-	  /* Not thread safe, but do we care? */
>-	  __small_sprintf (errbuf, "died waiting for longjmp before initialization, "
>-			   "retry %d, exit code %p", ch.retry, exit_code);
>-	  error = errbuf;
>+	  if (exit_code != EXITCODE_FORK_FAILED)
>+	    {
>+	      /* Not thread safe, but do we care? */
>+	      __small_sprintf (errbuf, "died waiting for longjmp before initialization, "
>+			       "retry %d, exit code %p", ch.retry, exit_code);
>+	      error = errbuf;
>+	    }
> 	  goto cleanup;
> 	}
>       break;
>@@ -515,7 +519,8 @@ frok::parent (volatile char * volatile s
>   if (!ch.sync (child->pid, pi.hProcess, FORK_WAIT_TIMEOUT))
>     {
>       this_errno = EAGAIN;
>-      error = "died waiting for dll loading";
>+      if (ch.exit_code != EXITCODE_FORK_FAILED)
>+	  error = "died waiting for dll loading";
>       goto cleanup;
>     }
> 
>diff --git a/heap.cc b/heap.cc
>--- a/heap.cc
>+++ b/heap.cc
>@@ -88,11 +88,11 @@ heap_init ()
> 	  if ((reserve_size -= page_const) < allocsize)
> 	    break;
> 	}
>-      if (!p && in_forkee && !fork_info->handle_failure (GetLastError ()))
>-	api_fatal ("couldn't allocate heap, %E, base %p, top %p, "
>-		   "reserve_size %d, allocsize %d, page_const %d",
>-		   cygheap->user_heap.base, cygheap->user_heap.top,
>-		   reserve_size, allocsize, page_const);
>+      if (!p)
>+	fork_api_fatal ("couldn't allocate heap, %E, base %p, top %p, "
>+			"reserve_size %d, allocsize %d, page_const %d",
>+			cygheap->user_heap.base, cygheap->user_heap.top,
>+			reserve_size, allocsize, page_const);

Why is the "in_forkee" dropped here?

cgf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-22  1:41 ` Christopher Faylor
@ 2011-05-22 13:05   ` Ryan Johnson
  2011-05-22 20:29     ` Christopher Faylor
  0 siblings, 1 reply; 8+ messages in thread
From: Ryan Johnson @ 2011-05-22 13:05 UTC (permalink / raw)
  To: cygwin-patches

On 21/05/2011 9:41 PM, Christopher Faylor wrote:
> On Wed, May 11, 2011 at 02:31:23PM -0400, Ryan Johnson wrote:
>> Hi all,
>>
>> This is the first of a series of patches, sent in separate emails as
>> requested.
>>
>> The first patch allows a child which failed due to address space
>> clobbers to report cleanly back to the parent. As a result, DLL_LINK
>> which land wrong, DLL_LOAD whose space gets clobbered, and failure to
>> replicate the cygheap, generate retries and dispense with the terminal
>> spam. Handling of unexpected errors should not have changed. Further,
>> the patch fixes several sources of access violations and crashes,
>> including:
>> - accessing invalid state after failing to notice that a
>> statically-linked dll loaded at the wrong location
>> - accessing invalid state while running dtors on a failed forkee. I
>> follow cgf's approach of simply not running any dtors, based on the
>> observation that dlls in the parent (gcc_s!) can store state about other
>> dlls and crash trying to access that state in the child, even if they
>> appeared to map properly in both processes.
>> - attempting to generate a stack trace when somebody in the call chain
>> used alloca(). This one is only sidestepped here, because we eliminate
>> the access violations and api_fatal calls which would have triggered the
>> problematic stack traces. I have a separate patch which allows offending
>> functions to disable stack traces, if folks are interested, but it was
>> kind of noisy so I left it out for now (cygwin uses alloca pretty
>> liberally!).
>>
>> Ryan
>> diff --git a/child_info.h b/child_info.h
>> --- a/child_info.h
>> +++ b/child_info.h
>> @@ -92,6 +92,18 @@ public:
>>    void alloc_stack_hard_way (volatile char *);
>> };
>>
>> +/* Several well-known problems can prevent us from patching up a
>> +   forkee; when such errors arise the child should exit cleanly (with
>> +   a failure code for the parent) rather than dumping stack.  */
>> +#define fork_api_fatal(fmt, args...)					\
>> +  do									\
>> +    {									\
>> +      sigproc_printf (fmt,## args);					\
>> +      fork_info->handle_failure (-1);					\
>> +    }									\
>> +  while(0)
>> +
>> +
>> class fhandler_base;
>>
>> class cygheap_exec_info
>> diff --git a/dll_init.cc b/dll_init.cc
>> --- a/dll_init.cc
>> +++ b/dll_init.cc
>> @@ -19,6 +19,7 @@ details. */
>> #include "dtable.h"
>> #include "cygheap.h"
>> #include "pinfo.h"
>> +#include "child_info.h"
>> #include "cygtls.h"
>> #include "exception.h"
>> #include<wchar.h>
>> @@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
>>      {
>>        if (!in_forkee)
>> 	d->count++;	/* Yes.  Bump the usage count. */
>> +      else if (d->handle != h)
>> +	fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
>> +			d->name, d->handle, h);
> You seem to be guranteeing a failure in a condition which could conceivably work
> ok for simple applications, i.e., if a dll loads in a different location that
> is not necessarily going to cause a problem.
By fork semantics the condition *is* a failure. If we try to relax the 
requirement we risk Bad Things happening, usually in hard-to-diagnose 
ways. The example I have right off is libgcc_s storing pointers to other 
dlls and seg faulting when it tries to access pointers which were valid 
in the parent but not in the child. I prefer a fail-fast solution over 
cross-fingers-and-hope-it-doesn't-happen-to-me.

As it is, I'm pretty nervous that Bad Things could happen at some point 
with windows dlls mapping to the wrong location (we detect that only 
when they clobber something cygwin needs), but we can hope that few apps 
which fork() are also heavy windows dlls.... and cross our fingers.

>>        d->p = p;
>>      }
>>    else
>>      {
>> +      if (in_forkee)
>> +	system_printf ("Unexpected dll loaded during fork: %W", name);
>> +
>>        /* FIXME: Change this to new at some point. */
>>        d = (dll *) cmalloc (HEAP_2_DLL, sizeof (*d) + (namelen * sizeof (*name)));
>>
>> @@ -371,7 +378,6 @@ dll_list::load_after_fork (HANDLE parent
>>              preferred_block = reserve_at (d->name, (DWORD) h);
>>
>> 	}
>> -  in_forkee = false;
>> }
>>
>> struct dllcrt0_info
>> diff --git a/dll_init.h b/dll_init.h
>> --- a/dll_init.h
>> +++ b/dll_init.h
>> @@ -57,7 +57,7 @@ struct dll
>>    int init ();
>>    void run_dtors ()
>>    {
>> -    if (has_dtors)
>> +    if (has_dtors&&  !in_forkee)
>>        {
>> 	has_dtors = 0;
>> 	p.run_dtors ();
> Isn't this already handled in dll_init.cc?
Yes. I didn't notice you had checked that into CVS (you hadn't yet the 
time I did look). However, the above does have the advantage of residing 
in one location rather than 2+.

>> diff --git a/fork.cc b/fork.cc
>> --- a/fork.cc
>> +++ b/fork.cc
>> @@ -233,6 +233,7 @@ frok::child (volatile char * volatile he
>>        sync_with_parent ("loaded dlls", true);
>>      }
>>
>> +  in_forkee = false;
>>    init_console_handler (myself->ctty>= 0);
>>    ForceCloseHandle1 (fork_info->forker_finished, forker_finished);
>>
>> @@ -393,10 +394,13 @@ frok::parent (volatile char * volatile s
>> 	  if (!exit_code)
>> 	    continue;
>> 	  this_errno = EAGAIN;
>> -	  /* Not thread safe, but do we care? */
>> -	  __small_sprintf (errbuf, "died waiting for longjmp before initialization, "
>> -			   "retry %d, exit code %p", ch.retry, exit_code);
>> -	  error = errbuf;
>> +	  if (exit_code != EXITCODE_FORK_FAILED)
>> +	    {
>> +	      /* Not thread safe, but do we care? */
>> +	      __small_sprintf (errbuf, "died waiting for longjmp before initialization, "
>> +			       "retry %d, exit code %p", ch.retry, exit_code);
>> +	      error = errbuf;
>> +	    }
>> 	  goto cleanup;
>> 	}
>>        break;
>> @@ -515,7 +519,8 @@ frok::parent (volatile char * volatile s
>>    if (!ch.sync (child->pid, pi.hProcess, FORK_WAIT_TIMEOUT))
>>      {
>>        this_errno = EAGAIN;
>> -      error = "died waiting for dll loading";
>> +      if (ch.exit_code != EXITCODE_FORK_FAILED)
>> +	  error = "died waiting for dll loading";
>>        goto cleanup;
>>      }
>>
>> diff --git a/heap.cc b/heap.cc
>> --- a/heap.cc
>> +++ b/heap.cc
>> @@ -88,11 +88,11 @@ heap_init ()
>> 	  if ((reserve_size -= page_const)<  allocsize)
>> 	    break;
>> 	}
>> -      if (!p&&  in_forkee&&  !fork_info->handle_failure (GetLastError ()))
>> -	api_fatal ("couldn't allocate heap, %E, base %p, top %p, "
>> -		   "reserve_size %d, allocsize %d, page_const %d",
>> -		   cygheap->user_heap.base, cygheap->user_heap.top,
>> -		   reserve_size, allocsize, page_const);
>> +      if (!p)
>> +	fork_api_fatal ("couldn't allocate heap, %E, base %p, top %p, "
>> +			"reserve_size %d, allocsize %d, page_const %d",
>> +			cygheap->user_heap.base, cygheap->user_heap.top,
>> +			reserve_size, allocsize, page_const);
> Why is the "in_forkee" dropped here?
Good catch. At one point handle_failure() checked the condition to avoid 
duplicating code (all callers of fork_api_fatal need to check for 
in_forkee one way or another), but that seems to have gotten lost in the 
shuffle.

Ryan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-22 13:05   ` Ryan Johnson
@ 2011-05-22 20:29     ` Christopher Faylor
  2011-05-23  1:36       ` Christopher Faylor
  2011-05-24 17:49       ` Ryan Johnson
  0 siblings, 2 replies; 8+ messages in thread
From: Christopher Faylor @ 2011-05-22 20:29 UTC (permalink / raw)
  To: cygwin-patches

On Sun, May 22, 2011 at 09:04:40AM -0400, Ryan Johnson wrote:
>On 21/05/2011 9:41 PM, Christopher Faylor wrote:
>> On Wed, May 11, 2011 at 02:31:23PM -0400, Ryan Johnson wrote:
>>> Hi all,
>>>
>>> This is the first of a series of patches, sent in separate emails as
>>> requested.
>>>
>>> The first patch allows a child which failed due to address space
>>> clobbers to report cleanly back to the parent. As a result, DLL_LINK
>>> which land wrong, DLL_LOAD whose space gets clobbered, and failure to
>>> replicate the cygheap, generate retries and dispense with the terminal
>>> spam. Handling of unexpected errors should not have changed. Further,
>>> the patch fixes several sources of access violations and crashes,
>>> including:
>>> - accessing invalid state after failing to notice that a
>>> statically-linked dll loaded at the wrong location
>>> - accessing invalid state while running dtors on a failed forkee. I
>>> follow cgf's approach of simply not running any dtors, based on the
>>> observation that dlls in the parent (gcc_s!) can store state about other
>>> dlls and crash trying to access that state in the child, even if they
>>> appeared to map properly in both processes.
>>> - attempting to generate a stack trace when somebody in the call chain
>>> used alloca(). This one is only sidestepped here, because we eliminate
>>> the access violations and api_fatal calls which would have triggered the
>>> problematic stack traces. I have a separate patch which allows offending
>>> functions to disable stack traces, if folks are interested, but it was
>>> kind of noisy so I left it out for now (cygwin uses alloca pretty
>>> liberally!).
>>>
>>> Ryan
>>> diff --git a/child_info.h b/child_info.h
>>> --- a/child_info.h
>>> +++ b/child_info.h
>>> @@ -92,6 +92,18 @@ public:
>>>    void alloc_stack_hard_way (volatile char *);
>>> };
>>>
>>> +/* Several well-known problems can prevent us from patching up a
>>> +   forkee; when such errors arise the child should exit cleanly (with
>>> +   a failure code for the parent) rather than dumping stack.  */
>>> +#define fork_api_fatal(fmt, args...)					\
>>> +  do									\
>>> +    {									\
>>> +      sigproc_printf (fmt,## args);					\
>>> +      fork_info->handle_failure (-1);					\
>>> +    }									\
>>> +  while(0)
>>> +
>>> +
>>> class fhandler_base;
>>>
>>> class cygheap_exec_info
>>> diff --git a/dll_init.cc b/dll_init.cc
>>> --- a/dll_init.cc
>>> +++ b/dll_init.cc
>>> @@ -19,6 +19,7 @@ details. */
>>> #include "dtable.h"
>>> #include "cygheap.h"
>>> #include "pinfo.h"
>>> +#include "child_info.h"
>>> #include "cygtls.h"
>>> #include "exception.h"
>>> #include<wchar.h>
>>> @@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
>>>      {
>>>        if (!in_forkee)
>>> 	d->count++;	/* Yes.  Bump the usage count. */
>>> +      else if (d->handle != h)
>>> +	fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
>>> +			d->name, d->handle, h);
>> You seem to be guranteeing a failure in a condition which could conceivably work
>> ok for simple applications, i.e., if a dll loads in a different location that
>> is not necessarily going to cause a problem.
>By fork semantics the condition *is* a failure. If we try to relax the 
>requirement we risk Bad Things happening, usually in hard-to-diagnose 
>ways. The example I have right off is libgcc_s storing pointers to other 
>dlls and seg faulting when it tries to access pointers which were valid 
>in the parent but not in the child. I prefer a fail-fast solution over 
>cross-fingers-and-hope-it-doesn't-happen-to-me.

When you add a failure case like this you are assuming that you
understand all of the parameters and that it will make things better.  I
am not convinced that this won't cause previously working cases to fail.

It is not inconceivable that a DLL could be relocated into another
location and continue to work in a forked process.  Yes, I know this
doesn't match the way fork is supposed to work but I'm not as concerned
about that as I am about Cygwin mailing list complaints about new
failures.

The reason I'm objecting to this is because I've considered, from time
to time, adding a similar test but have always avoided it because I
couldn't convince myself that it would help more than it would hurt.  If
we are going to add tests, I'd prefer that the testing be done in
frok:parent when the child_copy happens for static and dynamic dlls,
maybe by adding a dll function which first checks that the data/bss can
be copied to the same location as the parent.

>As it is, I'm pretty nervous that Bad Things could happen at some point 
>with windows dlls mapping to the wrong location (we detect that only 
>when they clobber something cygwin needs), but we can hope that few apps 
>which fork() are also heavy windows dlls.... and cross our fingers.
>
>>>        d->p = p;
>>>      }
>>>    else
>>>      {
>>> +      if (in_forkee)
>>> +	system_printf ("Unexpected dll loaded during fork: %W", name);
>>> +
>>>        /* FIXME: Change this to new at some point. */
>>>        d = (dll *) cmalloc (HEAP_2_DLL, sizeof (*d) + (namelen * sizeof (*name)));
>>>
>>> @@ -371,7 +378,6 @@ dll_list::load_after_fork (HANDLE parent
>>>              preferred_block = reserve_at (d->name, (DWORD) h);
>>>
>>> 	}
>>> -  in_forkee = false;
>>> }
>>>
>>> struct dllcrt0_info
>>> diff --git a/dll_init.h b/dll_init.h
>>> --- a/dll_init.h
>>> +++ b/dll_init.h
>>> @@ -57,7 +57,7 @@ struct dll
>>>    int init ();
>>>    void run_dtors ()
>>>    {
>>> -    if (has_dtors)
>>> +    if (has_dtors&&  !in_forkee)
>>>        {
>>> 	has_dtors = 0;
>>> 	p.run_dtors ();
>> Isn't this already handled in dll_init.cc?
>Yes. I didn't notice you had checked that into CVS (you hadn't yet the 
>time I did look). However, the above does have the advantage of residing 
>in one location rather than 2+.

My change was dated 2011/05/04.  That predates your patch by a week.

It's not an advantage to dive into the function multiple times when you
can short-circuit repeated attempts to call it once.

cgf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-22 20:29     ` Christopher Faylor
@ 2011-05-23  1:36       ` Christopher Faylor
  2011-05-24 17:49       ` Ryan Johnson
  1 sibling, 0 replies; 8+ messages in thread
From: Christopher Faylor @ 2011-05-23  1:36 UTC (permalink / raw)
  To: cygwin-patches

On Sun, May 22, 2011 at 04:29:18PM -0400, Christopher Faylor wrote:
>...I'd prefer that the testing be done in frok:parent when the
>child_copy happens for static and dynamic dlls, maybe by adding a dll
>function which first checks that the data/bss can be copied to the same
>location as the parent.

Actually, no, I think there's a better way to do that.  I'll check something
in in the next couple of days.

I think the rest of your patches have merit too so I'll be checking those
in as well.  Thanks for all of your work.

cgf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improvements to fork handling (1/5)
  2011-05-22 20:29     ` Christopher Faylor
  2011-05-23  1:36       ` Christopher Faylor
@ 2011-05-24 17:49       ` Ryan Johnson
  1 sibling, 0 replies; 8+ messages in thread
From: Ryan Johnson @ 2011-05-24 17:49 UTC (permalink / raw)
  To: cygwin-patches

On 22/05/2011 4:29 PM, Christopher Faylor wrote:
> On Sun, May 22, 2011 at 09:04:40AM -0400, Ryan Johnson wrote:
>> On 21/05/2011 9:41 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:23PM -0400, Ryan Johnson wrote:
>>>> Hi all,
>>>>
>>>> This is the first of a series of patches, sent in separate emails as
>>>> requested.
>>>>
>>>> The first patch allows a child which failed due to address space
>>>> clobbers to report cleanly back to the parent. As a result, DLL_LINK
>>>> which land wrong, DLL_LOAD whose space gets clobbered, and failure to
>>>> replicate the cygheap, generate retries and dispense with the terminal
>>>> spam. Handling of unexpected errors should not have changed. Further,
>>>> the patch fixes several sources of access violations and crashes,
>>>> including:
>>>> - accessing invalid state after failing to notice that a
>>>> statically-linked dll loaded at the wrong location
>>>> - accessing invalid state while running dtors on a failed forkee. I
>>>> follow cgf's approach of simply not running any dtors, based on the
>>>> observation that dlls in the parent (gcc_s!) can store state about other
>>>> dlls and crash trying to access that state in the child, even if they
>>>> appeared to map properly in both processes.
>>>> - attempting to generate a stack trace when somebody in the call chain
>>>> used alloca(). This one is only sidestepped here, because we eliminate
>>>> the access violations and api_fatal calls which would have triggered the
>>>> problematic stack traces. I have a separate patch which allows offending
>>>> functions to disable stack traces, if folks are interested, but it was
>>>> kind of noisy so I left it out for now (cygwin uses alloca pretty
>>>> liberally!).
>>>>
>>>> Ryan
>>>> diff --git a/child_info.h b/child_info.h
>>>> --- a/child_info.h
>>>> +++ b/child_info.h
>>>> @@ -92,6 +92,18 @@ public:
>>>>     void alloc_stack_hard_way (volatile char *);
>>>> };
>>>>
>>>> +/* Several well-known problems can prevent us from patching up a
>>>> +   forkee; when such errors arise the child should exit cleanly (with
>>>> +   a failure code for the parent) rather than dumping stack.  */
>>>> +#define fork_api_fatal(fmt, args...)					\
>>>> +  do									\
>>>> +    {									\
>>>> +      sigproc_printf (fmt,## args);					\
>>>> +      fork_info->handle_failure (-1);					\
>>>> +    }									\
>>>> +  while(0)
>>>> +
>>>> +
>>>> class fhandler_base;
>>>>
>>>> class cygheap_exec_info
>>>> diff --git a/dll_init.cc b/dll_init.cc
>>>> --- a/dll_init.cc
>>>> +++ b/dll_init.cc
>>>> @@ -19,6 +19,7 @@ details. */
>>>> #include "dtable.h"
>>>> #include "cygheap.h"
>>>> #include "pinfo.h"
>>>> +#include "child_info.h"
>>>> #include "cygtls.h"
>>>> #include "exception.h"
>>>> #include<wchar.h>
>>>> @@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
>>>>       {
>>>>         if (!in_forkee)
>>>> 	d->count++;	/* Yes.  Bump the usage count. */
>>>> +      else if (d->handle != h)
>>>> +	fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
>>>> +			d->name, d->handle, h);
>>> You seem to be guranteeing a failure in a condition which could conceivably work
>>> ok for simple applications, i.e., if a dll loads in a different location that
>>> is not necessarily going to cause a problem.
>> By fork semantics the condition *is* a failure. If we try to relax the
>> requirement we risk Bad Things happening, usually in hard-to-diagnose
>> ways. The example I have right off is libgcc_s storing pointers to other
>> dlls and seg faulting when it tries to access pointers which were valid
>> in the parent but not in the child. I prefer a fail-fast solution over
>> cross-fingers-and-hope-it-doesn't-happen-to-me.
> When you add a failure case like this you are assuming that you
> understand all of the parameters and that it will make things better.  I
> am not convinced that this won't cause previously working cases to fail.
>
> It is not inconceivable that a DLL could be relocated into another
> location and continue to work in a forked process.  Yes, I know this
> doesn't match the way fork is supposed to work but I'm not as concerned
> about that as I am about Cygwin mailing list complaints about new
> failures.
WJM?

> The reason I'm objecting to this is because I've considered, from time
> to time, adding a similar test but have always avoided it because I
> couldn't convince myself that it would help more than it would hurt.  If
> we are going to add tests, I'd prefer that the testing be done in
> frok:parent when the child_copy happens for static and dynamic dlls,
> maybe by adding a dll function which first checks that the data/bss can
> be copied to the same location as the parent.
Maybe the new way you mentioned you're working on obviates the above, but:

1. We won't necessarily get as far as frok::parent. Both the above check 
and the access violations it avoids usually happen before cygwin1.dll 
initializes.

2. Because it runs before child_copy, the above check triggers a retry 
on failure and the overall fork can still succeed.

3. Any additional failures due to the above check would be just as 
intermittent as what happens now. It's not like some app would suddenly 
and systematically refuse to fork with this patch in place. At worst 
fork failures might become somewhat more frequent. Caveat: sometimes the 
parent's address space layout is especially hostile and fork does seem 
to fail systematically. However, this happens to me either way and can 
be fixed by restarting the offending process.

4. My experiments suggest that most access violations during fork arise 
precisely because dlls move, and moved dlls usually cause access 
violations. Those already generate noise on the list (emacs, anyone?), 
and the above fix would make that noise either disappear (successful 
retry) or merge with the usual gripes about "fork failed: resource 
temporarily unavailable" where it belongs.

5. Not performing the above check can cause forked programs to 
mysteriously fail at some later point after the fork appears to succeed. 
Given that we have at least one known example (cyggcc_s) I would argue 
that forks gone bad contribute to user frustration but do not generate 
mailing list traffic because it looks like an intermittent and 
non-reproducible problem with the user's code. WJM?

Ryan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-24 17:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-11 18:31 Improvements to fork handling (1/5) Ryan Johnson
2011-05-17 11:14 ` Ryan Johnson
2011-05-17 16:45   ` Christopher Faylor
2011-05-22  1:41 ` Christopher Faylor
2011-05-22 13:05   ` Ryan Johnson
2011-05-22 20:29     ` Christopher Faylor
2011-05-23  1:36       ` Christopher Faylor
2011-05-24 17:49       ` Ryan Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).