public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Debugging help for fork failure: resource temporarily unavailable
@ 2011-03-05 22:17 Ryan Johnson
  2011-03-06 15:03 ` chm
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ryan Johnson @ 2011-03-05 22:17 UTC (permalink / raw)
  To: cygwin

Hi all,

I'm hitting the oh-so-delightful fork failures when trying to compile a 
cross-compiler toolchain, which is a pain because one fork failure makes 
crosstool-ng start over. I've rebased, I've been over the BLODA (Windows 
Defender slipped in even after I rejected the download), and while they 
definitely helped there's likely to be at least one fork failure while 
compiling a big project like glibc.

So, now comes my plea (I don't know enough about cygwin to do this 
myself). It seems like the usual culprit -- dll injection in the child 
at an address that the parent already used -- could easily be diagnosed 
by the code which notices and aborts the fork: given two dlls which want 
to use the same address in the child process, the one at a different 
address in the parent is probably to blame. Fingering this offending 
DLL, either as part of the fork failure message or in a log file of some 
sort, would make it infinitely easier for users to diagnose the problem, 
and would also give a much clearer idea of what really went wrong (we 
could order the BLODA by how often each app causes headaches, for example).

Might it be possible to do an LD_PRELOAD of some sort which hooks into 
fork() at the critical moment and prints the differences between 
/proc/$parent/maps and /proc/$child/maps? The code doesn't even need to 
be efficient; it just needs to be able to run when whatever internal 
helper of fork() returns an error but before the nascent child process 
is terminated.

If there exists such a convenient instrumentation point, I might be up 
to the task of exploiting it, but I wouldn't know where to start.

Thoughts? Ideas?
Ryan




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-05 22:17 Debugging help for fork failure: resource temporarily unavailable Ryan Johnson
@ 2011-03-06 15:03 ` chm
  2011-03-07 15:29 ` Ryan Johnson
  2011-03-09 10:34 ` Corinna Vinschen
  2 siblings, 0 replies; 15+ messages in thread
From: chm @ 2011-03-06 15:03 UTC (permalink / raw)
  To: cygwin; +Cc: Ryan Johnson

On 2:59 PM, Ryan Johnson wrote:
 >
 > I'm hitting the oh-so-delightful fork failures when trying
 > to compile a cross-compiler toolchain, which is a pain
 > because one fork failure makes crosstool-ng start over. I've
 > rebased, I've been over the BLODA (Windows Defender slipped
 > in even after I rejected the download), and while they
 > definitely helped there's likely to be at least one fork
 > failure while compiling a big project like glibc.
 >
 > So, now comes my plea (I don't know enough about cygwin to
 > do this myself). It seems like the usual culprit -- dll
 > injection in the child at an address that the parent already
 > used -- could easily be diagnosed by the code which notices
 > and aborts the fork: given two dlls which want to use the
 > same address in the child process, the one at a different
 > address in the parent is probably to blame. Fingering this
 > offending DLL, either as part of the fork failure message
 > or in a log file of some sort, would make it infinitely
 > easier for users to diagnose the problem, and would also
 > give a much clearer idea of what really went wrong (we could
 > order the BLODA by how often each app causes headaches, for
 > example).

I would like to second the motion.  This additional
information would be a help in diagnosing/discussing
and possible repairing the problem.  My situation
involves many perl modules and the current solution
is to rebaseall, peflagsall, and perlrebase intermingled
with system reboots until things work.

--Chris

 > Might it be possible to do an LD_PRELOAD of some sort
 > which hooks into fork() at the critical moment and
 > prints the differences between /proc/$parent/maps and
 > /proc/$child/maps? The code doesn't even need to be
 > efficient; it just needs to be able to run when whatever
 > internal helper of fork() returns an error but before the
 > nascent child process is terminated.
 >
 > If there exists such a convenient instrumentation point, I
 > might be up to the task of exploiting it, but I wouldn't
 > know where to start.
 >
 > Thoughts? Ideas?
 > Ryan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-05 22:17 Debugging help for fork failure: resource temporarily unavailable Ryan Johnson
  2011-03-06 15:03 ` chm
@ 2011-03-07 15:29 ` Ryan Johnson
  2011-03-09 10:34 ` Corinna Vinschen
  2 siblings, 0 replies; 15+ messages in thread
From: Ryan Johnson @ 2011-03-07 15:29 UTC (permalink / raw)
  To: cygwin

On 2:59 PM, Ryan Johnson wrote:
> I'm hitting the oh-so-delightful fork failures when trying to compile 
> a cross-compiler toolchain, which is a pain because one fork failure 
> makes crosstool-ng start over. I've rebased, I've been over the BLODA 
> (Windows Defender slipped in even after I rejected the download), and 
> while they definitely helped there's likely to be at least one fork 
> failure while compiling a big project like glibc.
>
> So, now comes my plea (I don't know enough about cygwin to do this 
> myself). It seems like the usual culprit -- dll injection in the child 
> at an address that the parent already used -- could easily be 
> diagnosed by the code which notices and aborts the fork: given two 
> dlls which want to use the same address in the child process, the one 
> at a different address in the parent is probably to blame. Fingering 
> this offending DLL, either as part of the fork failure message or in a 
> log file of some sort, would make it infinitely easier for users to 
> diagnose the problem, and would also give a much clearer idea of what 
> really went wrong (we could order the BLODA by how often each app 
> causes headaches, for example).
Actually, a follow-up question: what is the difference between the fork 
(e.g. resource unavailable) failures vs. the errors about 'failed to 
remap dll ...' ? Looking at the code in dll_init.cc, if failure to remap 
a dll were really the source of fork failing, the error message should 
say so. Is there some other issue due to BLODA that also causes forks to 
fail?

Also, how does the (new?) peflagsall script interact with dll injection? 
It sounds like it's supposed to fix dll remapping problems on machines 
which support ASLR...

Thanks,
Ryan



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-05 22:17 Debugging help for fork failure: resource temporarily unavailable Ryan Johnson
  2011-03-06 15:03 ` chm
  2011-03-07 15:29 ` Ryan Johnson
@ 2011-03-09 10:34 ` Corinna Vinschen
  2011-03-09 17:04   ` Christopher Faylor
  2011-03-09 17:53   ` Ryan Johnson
  2 siblings, 2 replies; 15+ messages in thread
From: Corinna Vinschen @ 2011-03-09 10:34 UTC (permalink / raw)
  To: cygwin

On Mar  5 17:15, Ryan Johnson wrote:
> Hi all,
> 
> I'm hitting the oh-so-delightful fork failures when trying to
> compile a cross-compiler toolchain, which is a pain because one fork
> failure makes crosstool-ng start over. I've rebased, I've been over
> the BLODA (Windows Defender slipped in even after I rejected the
> download), and while they definitely helped there's likely to be at
> least one fork failure while compiling a big project like glibc.
> 
> So, now comes my plea (I don't know enough about cygwin to do this
> myself). It seems like the usual culprit -- dll injection in the
> child at an address that the parent already used -- could easily be
> diagnosed by the code which notices and aborts the fork: given two
> dlls which want to use the same address in the child process, the
> one at a different address in the parent is probably to blame.
> Fingering this offending DLL, either as part of the fork failure
> message or in a log file of some sort, would make it infinitely
> easier for users to diagnose the problem, and would also give a much
> clearer idea of what really went wrong (we could order the BLODA by
> how often each app causes headaches, for example).
> 
> Might it be possible to do an LD_PRELOAD of some sort which hooks
> into fork() at the critical moment and prints the differences
> between /proc/$parent/maps and /proc/$child/maps? The code doesn't
> even need to be efficient; it just needs to be able to run when
> whatever internal helper of fork() returns an error but before the
> nascent child process is terminated.
> 
> If there exists such a convenient instrumentation point, I might be
> up to the task of exploiting it, but I wouldn't know where to start.

It's not that easy.  LD_PRELOAD is only honored after the other
stuff to duplicate the parent process has already taken place.

This is definitely not something for 1.7.9, but maybe we can utilize
the functionality we already have on board at one point.  In
fhandler_process.cc we have the function format_process_maps(), which
creates a buffer with the content of /proc/$PID/maps.  It might be
possible to call this function from fork for parent and child if fork
fails for this reason, and print this information.

Just an idea.  Somebody still would have to do it(*).


Corinna


(*) http://cygwin.com/contrib.html


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-09 10:34 ` Corinna Vinschen
@ 2011-03-09 17:04   ` Christopher Faylor
  2011-03-09 17:53   ` Ryan Johnson
  1 sibling, 0 replies; 15+ messages in thread
From: Christopher Faylor @ 2011-03-09 17:04 UTC (permalink / raw)
  To: cygwin

On Wed, Mar 09, 2011 at 11:22:57AM +0100, Corinna Vinschen wrote:
>Just an idea.  Somebody still would have to do it(*).

I've been musing about some ways to make dll handling more robust.
Maybe I'll poke at it for 1.7.10.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-09 10:34 ` Corinna Vinschen
  2011-03-09 17:04   ` Christopher Faylor
@ 2011-03-09 17:53   ` Ryan Johnson
  2011-03-12 20:57     ` Jon TURNEY
  1 sibling, 1 reply; 15+ messages in thread
From: Ryan Johnson @ 2011-03-09 17:53 UTC (permalink / raw)
  To: cygwin

On 2:59 PM, Corinna Vinschen wrote:
> On Mar  5 17:15, Ryan Johnson wrote:
>> Might it be possible to do an LD_PRELOAD of some sort which hooks
>> into fork() at the critical moment and prints the differences
>> between /proc/$parent/maps and /proc/$child/maps? The code doesn't
>> even need to be efficient; it just needs to be able to run when
>> whatever internal helper of fork() returns an error but before the
>> nascent child process is terminated.
>>
>> If there exists such a convenient instrumentation point, I might be
>> up to the task of exploiting it, but I wouldn't know where to start.
> It's not that easy.  LD_PRELOAD is only honored after the other
> stuff to duplicate the parent process has already taken place.
>
> This is definitely not something for 1.7.9, but maybe we can utilize
> the functionality we already have on board at one point.  In
> fhandler_process.cc we have the function format_process_maps(), which
> creates a buffer with the content of /proc/$PID/maps.  It might be
> possible to call this function from fork for parent and child if fork
> fails for this reason, and print this information.
>
> Just an idea.  Somebody still would have to do it(*).
I was actually thinking of an LD_PRELOAD in the parent process which 
would cause new/additional code to execute when that parent forks. 
However, after poking around in the code I'm not sure this is possible, 
since IIRC LD_PRELOAD can only override dynamically-linked functions. 
That probably means an actual change to cygwin is required, as you suggest.

BTW, while looking at the code I noticed a potential source of remap 
problems: if B depends on A, and we remap A first, then only A's 
location will be checked carefully; B will be pulled in wherever it 
happens to end up when we do the full load of A. The code seems to 
assume that every DLL we try to remap is currently not loaded.

I'm actually not sure what would happen when time came to remap B, 
because loading it would just return the handle we didn't know we had, 
and closing that handle wouldn't take its reference count to zero.  
Incidentally, this same problem would arise if a BLODA injected a DLL 
into the process -- that DLL would be on the todo list for fork() to 
process (because it was also injected into the parent process), but 
would already be loaded by the time we try to remap it. Also, if we do 
want to force Windows not to put a dll in a certain address, wouldn't it 
make more sense to reserve the (wrong) space it went into on the first 
try? Right now if the offending location is higher than the one we want, 
nothing stops Windows from just putting it right back in its old spot 
because the code only reserves locations lower than the desired one.

Is this accurate or am I missing something here?

I assume there's a way to enumerate the dlls loaded in a given process; 
would it make sense to use a three-step algorithm?
1. Unload all currently-loaded dlls, complaining loudly to stderr or a 
log file (these are due to BLODA and deserve to be called out)
2. Load without deps every DLL and make sure it lands at the right 
address (using memory reservation tricks if needed)
3. Reload with deps every DLL. Presumably once it has landed correctly 
once it will do so thereafter (the current code assumes this, at least)

In theory, the first step might allow cygwin to resist dll injection 
(maybe on an opt-out basis?), though I don't know what the consequences 
of that choice would be.

The third step would be significantly easier if we had a dependency 
graph so that we could ensure dependencies always get processed before 
they're needed, but I don't know if that's feasible. How 
expensive/embeddable is cygcheck?

Ryan


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-09 17:53   ` Ryan Johnson
@ 2011-03-12 20:57     ` Jon TURNEY
  2011-03-15 15:04       ` Ryan Johnson
  0 siblings, 1 reply; 15+ messages in thread
From: Jon TURNEY @ 2011-03-12 20:57 UTC (permalink / raw)
  To: cygwin

On 07/03/2011 15:10, Ryan Johnson wrote:
> Actually, a follow-up question: what is the difference between the fork(e.g.
> resource unavailable) failures vs. the errors about 'failed to remap dll...'
> ? Looking at the code in dll_init.cc, if failure to remap a dll were really
> the source of fork failing, the error message should say so. Is there some
> other issue due to BLODA that also causes forks to fail?

First of all, I don't know the answer to this question :-)

EAGAIN is the error which fork() returns when a remap failure occurs, so if
you don't have the stderr output, I guess that might be all you'll see?

Secondly, I'm by no means an expert on this issue, these are just my observations.

On 09/03/2011 17:04, Ryan Johnson wrote
> BTW, while looking at the code I noticed a potential source of remap problems:
> if B depends on A, and we remap A first, then only A's location will be
> checked carefully; B will be pulled in wherever it happens to end up when we
> do the full load of A. The code seems to assume that every DLL we try to remap
> is currently not loaded.
> 
> I'm actually not sure what would happen when time came to remap B, because
> loading it would just return the handle we didn't know we had, and closing
> that handle wouldn't take its reference count to zero.  

I too have idly mused that there might be an issue with dependent DLLs here.

But, since dll_list::load_after_fork() walks the dll list in the same order as
the dlopen() calls occur, I've never been able to convince myself there is a
real problem, barring esoteric scenarios like: B depends on A, C depends on A,
load B, load C (C collides with A so loads at non-preferred address), unload
B, fork

That doesn't match what really happens though: where problems are seen it's
often with python or perl, which dynamically load libraries when modules are
imported, but won't unload them in normal use.

> Incidentally, this
> same problem would arise if a BLODA injected a DLL into the process -- that
> DLL would be on the todo list for fork() to process (because it was also
> injected into the parent process), but would already be loaded by the time we
> try to remap it. Also, if we do want to force Windows not to put a dll in a
> certain address, wouldn't it make more sense to reserve the (wrong) space it
> went into on the first try? Right now if the offending location is higher than
> the one we want, nothing stops Windows from just putting it right back in its
> old spot because the code only reserves locations lower than the desired one.
> 
> Is this accurate or am I missing something here?

I'm not sure that particular scenario with injected DLLs is possible, as the
list traversed in dll_list::load_after_fork() is only of dynamically loaded
cygwin-based DLLs?

I think some more investigation of what the actual memory layout of DLLs is
when a remap failure occurs is needed before trying to fix the problem.

So, here is an actual test case, distilled from a failure I have observed
running the twisted test suite:

cygpyglib-2.0-python2.6-0.dll (glib bindings for python) depends on
cygglib-2.0-0.dll, but also has the same preferred base address, as there is a
collision in the hash of the filename used by --enable-auto-image-base.

$ objdump -p /usr/bin/cygpyglib-2.0-python2.6-0.dll | grep ^ImageBase
ImageBase               6aa40000

$ objdump -p /usr/bin/cygglib-2.0-0.dll | grep ^ImageBase
ImageBase               6aa40000

a small python program which just loads the DLLs and then forks:

#!/usr/bin/env python

import os
import glib # comment this line out to succeed

pid = os.fork()
if pid:
        # wait for child to exit
        os.waitpid(pid, 0)

here's the failure (with a little extra debugging output inserted into
cygwin1.dll to make it a little clearer what it's trying to do)

$ ./testcase.py
    431 [main] python 3008 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygiconv-2.dll @ 0x674C0000 using DONT_RESOLVE_DLL_REFERENCES
    719 [main] python 3008 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygintl-8.dll @ 0x6F5C0000 using DONT_RESOLVE_DLL_REFERENCES
    979 [main] python 3008 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygpcre-0.dll @ 0x64240000 using DONT_RESOLVE_DLL_REFERENCES
   1227 [main] python 3008 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
   1263 [main] python 3008 dll_list::load_after_fork: reserve_upto 0x18C40000
to try to force it to load there
   1473 [main] python 3008 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
   1620 [main] python 3008 C:\cygwin\bin\python.exe: *** fatal error - unable
to remap C:\cygwin\bin\cygglib-2.0-0.dll to same address as parent: 0x18C40000
!= 0x6AA40000

and I've confirmed that in the parent, cygpyglib-2.0-python2.6-0.dll loads at
0x6AA40000 and cygglib-2.0-0.dll loads at 0x18C40000.

At a wild guess, it looks like LoadLibraryEx() maps DLLs into memory starting
from the top of the dependency chain, but then calls the DLL's entry point
starting from the bottom of the dependency chain (which makes all kinds of
sense, but leads to this inversion of the load order in the child)

This is trivially worked around by rebasing one of the conflicting DLLs to a
different address, e.g.:

$ rebase -b 0x60000000 /usr/bin/cygglib-2.0-0.dll

$ ./testcase.py
      2 [main] python 2916 dll_list::load_after_fork:
C:\cygwin\bin\cyggcc_s-1.dll (type 1743781888) expected @ 0x0
    138 [main] python 2916 dll_list::load_after_fork:
C:\cygwin\bin\libpython2.6.dll (type 1741422592) expected @ 0x0
    728 [main] python 2916 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygiconv-2.dll @ 0x674C0000 using DONT_RESOLVE_DLL_REFERENCES
   1049 [main] python 2916 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygintl-8.dll @ 0x6F5C0000 using DONT_RESOLVE_DLL_REFERENCES
   1421 [main] python 2916 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygpcre-0.dll @ 0x64240000 using DONT_RESOLVE_DLL_REFERENCES
   1945 [main] python 2916 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygglib-2.0-0.dll @ 0x60000000 using DONT_RESOLVE_DLL_REFERENCES
   2791 [main] python 2916 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cyggthread-2.0-0.dll @ 0x6C000000 using DONT_RESOLVE_DLL_REFERENCES
   3110 [main] python 2916 dll_list::load_after_fork: LoadLibrary
C:\cygwin\bin\cygpyglib-2.0-python2.6-0.dll @ 0x6AA40000 using
DONT_RESOLVE_DLL_REFERENCES
   3647 [main] python 2916 dll_list::load_after_fork: LoadLibrary
\\?\C:\cygwin\lib\python2.6\site-packages\gtk-2.0\glib\_glib.dll @ 0x61680000
using DONT_RESOLVE_DLL_REFERENCES

This perhaps explains some remap issues which rebaseall fixes as that avoids
the possibility of dependent DLLs with colliding preferred base addresses.

I'm not sure what can be done fix this programmatically.

> I assume there's a way to enumerate the dlls loaded in a given process; would
> it make sense to use a three-step algorithm?
> 1. Unload all currently-loaded dlls, complaining loudly to stderr or a log
> file (these are due to BLODA and deserve to be called out)
> 2. Load without deps every DLL and make sure it lands at the right address
> (using memory reservation tricks if needed)
> 3. Reload with deps every DLL. Presumably once it has landed correctly once it
> will do so thereafter (the current code assumes this, at least)

Doing 2 & 3 is an interesting idea, the first call to let you pin it at a
particular address and the second to make it executable.

I've no idea what happens, but unfortunately, the comments in
dll_list::load_after_fork() seem to suggest this doesn't work, as the DLLs
entry point doesn't get called the second time it's loaded.

> In theory, the first step might allow cygwin to resist dll injection (maybe on
> an opt-out basis?), though I don't know what the consequences of that choice
> would be.
> 
> The third step would be significantly easier if we had a dependency graph so
> that we could ensure dependencies always get processed before they're needed,
> but I don't know if that's feasible. How expensive/embeddable is cygcheck?

Another idea (assuming my guess about LoadLibrary() behaviour above is
correct) would be to have dlopen() rather than simply call LoadLibrary() on a
DLL, construct the dependency tree of the DLL it's been asked to open and load
the DLLs starting from the bottom, so that the order of loading into memory
matches the order which entry points are called (and hence the order in
dll_list)? (This would have the advantage of not making fork() even more
heavyweight)

Alternatively, maybe all that is needed is a slightly more complex approach to
forcing the DLL to load at a particular address?  If reserve_upto() has been
called, but it loads higher than that, can we assume load order inversion has
occurred, and try to to block it from loading at it's preferred address by
VirtualAlloc()-ing there as well? I think I might even try to write a patch to
do that...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-12 20:57     ` Jon TURNEY
@ 2011-03-15 15:04       ` Ryan Johnson
  2011-03-15 17:52         ` BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable) Henry S. Thompson
  2011-04-04 18:40         ` Debugging help for fork failure: resource temporarily unavailable Jon TURNEY
  0 siblings, 2 replies; 15+ messages in thread
From: Ryan Johnson @ 2011-03-15 15:04 UTC (permalink / raw)
  To: Jon TURNEY; +Cc: cygwin

On 2:59 PM, Jon TURNEY wrote:
> On 09/03/2011 17:04, Ryan Johnson wrote
>> BTW, while looking at the code I noticed a potential source of remap problems:
>> if B depends on A, and we remap A first, then only A's location will be
>> checked carefully; B will be pulled in wherever it happens to end up when we
>> do the full load of A. The code seems to assume that every DLL we try to remap
>> is currently not loaded.
>>
>> I'm actually not sure what would happen when time came to remap B, because
>> loading it would just return the handle we didn't know we had, and closing
>> that handle wouldn't take its reference count to zero.
> I too have idly mused that there might be an issue with dependent DLLs here.
>
> But, since dll_list::load_after_fork() walks the dll list in the same order as
> the dlopen() calls occur, I've never been able to convince myself there is a
> real problem, barring esoteric scenarios like: B depends on A, C depends on A,
> load B, load C (C collides with A so loads at non-preferred address), unload
> B, fork
Oh, I see what you mean... in theory, asking Windows to load the same 
dlls in the same order should put them at the same addresses.
> That doesn't match what really happens though: where problems are seen it's
> often with python or perl, which dynamically load libraries when modules are
> imported, but won't unload them in normal use.
All of this assumes Windows is consistent in choosing locations when 
conflicts are involved. IOW, consider the case that B depends on A, with 
A and B both conflicting with a later-loaded C. The first time A and C 
load Windows will choose alternate locations for them, and if that order 
changes in the child, it's totally possible that A ends up in the child 
where C was in the parent.

>> Incidentally, this
>> same problem would arise if a BLODA injected a DLL into the process -- that
>> DLL would be on the todo list for fork() to process (because it was also
>> injected into the parent process), but would already be loaded by the time we
>> try to remap it. Also, if we do want to force Windows not to put a dll in a
>> certain address, wouldn't it make more sense to reserve the (wrong) space it
>> went into on the first try? Right now if the offending location is higher than
>> the one we want, nothing stops Windows from just putting it right back in its
>> old spot because the code only reserves locations lower than the desired one.
>>
>> Is this accurate or am I missing something here?
> I'm not sure that particular scenario with injected DLLs is possible, as the
> list traversed in dll_list::load_after_fork() is only of dynamically loaded
> cygwin-based DLLs?
Oh, so injected dlls, though not statically linked in, still wouldn't be 
on this list?

BTW, I found a good way to identify, if not fix, BLODA: given an app 
which loads no libraries at runtime -- such as 'ls' -- any dlls 
mentioned in /proc/$$/maps which cygcheck does not mention are probably 
dodgy. In my case, Windows Live (which I didn't think was even installed 
on my machine) has injected a WLIDNSP.DLL ("Microsoft Windows Live ID 
Namespace Provider") in all my processes.

> $ objdump -p /usr/bin/cygpyglib-2.0-python2.6-0.dll | grep ^ImageBase
> ImageBase               6aa40000
>
> $ objdump -p /usr/bin/cygglib-2.0-0.dll | grep ^ImageBase
> ImageBase               6aa40000
>
> C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
>     1263 [main] python 3008 dll_list::load_after_fork: reserve_upto 0x18C40000
> to try to force it to load there
>     1473 [main] python 3008 dll_list::load_after_fork: LoadLibrary
> C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
>     1620 [main] python 3008 C:\cygwin\bin\python.exe: *** fatal error - unable
> to remap C:\cygwin\bin\cygglib-2.0-0.dll to same address as parent: 0x18C40000
> != 0x6AA40000
>
> and I've confirmed that in the parent, cygpyglib-2.0-python2.6-0.dll loads at
> 0x6AA40000 and cygglib-2.0-0.dll loads at 0x18C40000.
>
> At a wild guess, it looks like LoadLibraryEx() maps DLLs into memory starting
> from the top of the dependency chain, but then calls the DLL's entry point
> starting from the bottom of the dependency chain (which makes all kinds of
> sense, but leads to this inversion of the load order in the child)
>
So the problem basically arises because dlls in the child are not 
actually loaded in the same order as in the parent?  In this case I 
assume that cygpyglib depends on cygglib, which suggests that we could 
avoid a lot of trouble by handling dependent children first.

Also, it looks like the above is exactly the case I suspected -- the 
offending dll attempts to load *higher* than where we want it, so 
reserving space below does nothing for us.
>> I assume there's a way to enumerate the dlls loaded in a given process; would
>> it make sense to use a three-step algorithm?
>> 1. Unload all currently-loaded dlls, complaining loudly to stderr or a log
>> file (these are due to BLODA and deserve to be called out)
>> 2. Load without deps every DLL and make sure it lands at the right address
>> (using memory reservation tricks if needed)
>> 3. Reload with deps every DLL. Presumably once it has landed correctly once it
>> will do so thereafter (the current code assumes this, at least)
> Doing 2&  3 is an interesting idea, the first call to let you pin it at a
> particular address and the second to make it executable.
>
> I've no idea what happens, but unfortunately, the comments in
> dll_list::load_after_fork() seem to suggest this doesn't work, as the DLLs
> entry point doesn't get called the second time it's loaded.
The code currently unloads the library completely and the reloads it 
normally, which I assumed was to ensure entry points get called.

>> In theory, the first step might allow cygwin to resist dll injection (maybe on
>> an opt-out basis?), though I don't know what the consequences of that choice
>> would be.
>>
>> The third step would be significantly easier if we had a dependency graph so
>> that we could ensure dependencies always get processed before they're needed,
>> but I don't know if that's feasible. How expensive/embeddable is cygcheck?
> Another idea (assuming my guess about LoadLibrary() behaviour above is
> correct) would be to have dlopen() rather than simply call LoadLibrary() on a
> DLL, construct the dependency tree of the DLL it's been asked to open and load
> the DLLs starting from the bottom, so that the order of loading into memory
> matches the order which entry points are called (and hence the order in
> dll_list)? (This would have the advantage of not making fork() even more
> heavyweight)
Some variant of objdump -p $THE_DLL  | grep 'DLL Name' ?

It might also make sense for the parent process to record some ordering 
information at dlopen time in case it forks later. Given that the dlls 
are opening anyway it would probably be cheap to do it then. Just build 
a tree of all dlls which the current dlopen() triggers dlopen() calls 
for. Alternatively (simpler?) just make dlopen() add dlls to its list 
just before it returns. That way, any recursive calls will add the 
dependencies to the list first. No special data structures needed. Only 
problem is, I can't see where in the source this magical list is 
generated in the first place :(

> Alternatively, maybe all that is needed is a slightly more complex approach to
> forcing the DLL to load at a particular address?  If reserve_upto() has been
> called, but it loads higher than that, can we assume load order inversion has
> occurred, and try to to block it from loading at it's preferred address by
> VirtualAlloc()-ing there as well? I think I might even try to write a patch to
> do that...
The second approach might be easier to hack together quickly, but the 
first would actually make fork() more efficient and eliminate a lot of 
code: it's likely that all the rebasing/remapping fallbacks could 
disappear.

A third alternative would be to traverse the remaining list of dlls and 
find the one that we should have loaded first. This would have to be 
recursive to handle the case where several dlls map to the same base, 
but might otherwise be workable.

Ryan


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable)
  2011-03-15 15:04       ` Ryan Johnson
@ 2011-03-15 17:52         ` Henry S. Thompson
  2011-03-16 19:55           ` Ryan Johnson
  2011-04-04 18:40         ` Debugging help for fork failure: resource temporarily unavailable Jon TURNEY
  1 sibling, 1 reply; 15+ messages in thread
From: Henry S. Thompson @ 2011-03-15 17:52 UTC (permalink / raw)
  To: cygwin

Ryan Johnson writes:

> BTW, I found a good way to identify, if not fix, BLODA: given an app
> which loads no libraries at runtime -- such as 'ls' -- any dlls
> mentioned in /proc/$$/maps which cygcheck does not mention are
> probably dodgy. In my case, Windows Live (which I didn't think was
> even installed on my machine) has injected a WLIDNSP.DLL ("Microsoft
> Windows Live ID Namespace Provider") in all my processes.

This would be super-cool if true, but it doesn't work for me. . .

If I try, I find

 C:\Windows\system32\ntmarta.dll
 C:\Windows\SysWOW64\sechost.dll
 C:\Windows\syswow64\WLDAP32.dll

in /proc/[ls procid]/maps but not in cygcheck output, but none of
those are BLODA, right?

[Note also that maps shows many things in syswow64 which cygcheck
shows in system32, but presumably that's because cygcheck itself is a
32-bit app, is it?]

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable)
  2011-03-15 17:52         ` BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable) Henry S. Thompson
@ 2011-03-16 19:55           ` Ryan Johnson
  2011-04-04 14:52             ` Jon TURNEY
  0 siblings, 1 reply; 15+ messages in thread
From: Ryan Johnson @ 2011-03-16 19:55 UTC (permalink / raw)
  To: Henry S. Thompson; +Cc: cygwin

On 2:59 PM, Henry S. Thompson wrote:
> Ryan Johnson writes:
>
>> BTW, I found a good way to identify, if not fix, BLODA: given an app
>> which loads no libraries at runtime -- such as 'ls' -- any dlls
>> mentioned in /proc/$$/maps which cygcheck does not mention are
>> probably dodgy. In my case, Windows Live (which I didn't think was
>> even installed on my machine) has injected a WLIDNSP.DLL ("Microsoft
>> Windows Live ID Namespace Provider") in all my processes.
> This would be super-cool if true, but it doesn't work for me. . .
>
> If I try, I find
>
>   C:\Windows\system32\ntmarta.dll
>   C:\Windows\SysWOW64\sechost.dll
>   C:\Windows\syswow64\WLDAP32.dll
>
> in /proc/[ls procid]/maps but not in cygcheck output, but none of
> those are BLODA, right?
>
> [Note also that maps shows many things in syswow64 which cygcheck
> shows in system32, but presumably that's because cygcheck itself is a
> 32-bit app, is it?]
>
Interesting...

$ join -i -v 1 <(cat /proc/$$/maps | sed 's;^.*/;;' | sort -f) 
<(cygcheck $(cat /proc/$$/winexename) | sed 's;^.*\\;;' | sort -f)
apphelp.dll
DNSAPI.dll
IMM32.DLL
MSCTF.dll
mswsock.dll
napinsp.dll
NLAapi.dll
NSI.dll
pnrpnsp.dll
PSAPI.DLL
sechost.dll
SHLWAPI.dll
winmm.dll
winrnr.dll
WLIDNSP.DLL
ws2_32.dll
wshbth.dll

The above shows all dlls loaded by the process which are not linked in 
at compile time. Does bash really load so many dynamic libraries, or is 
cygcheck missing things?

Ryan


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable)
  2011-03-16 19:55           ` Ryan Johnson
@ 2011-04-04 14:52             ` Jon TURNEY
  0 siblings, 0 replies; 15+ messages in thread
From: Jon TURNEY @ 2011-04-04 14:52 UTC (permalink / raw)
  To: cygwin

On 16/03/2011 19:37, Ryan Johnson wrote:
> On 2:59 PM, Henry S. Thompson wrote:
>> Ryan Johnson writes:
>>
>>> BTW, I found a good way to identify, if not fix, BLODA: given an app
>>> which loads no libraries at runtime -- such as 'ls' -- any dlls
>>> mentioned in /proc/$$/maps which cygcheck does not mention are
>>> probably dodgy. In my case, Windows Live (which I didn't think was
>>> even installed on my machine) has injected a WLIDNSP.DLL ("Microsoft
>>> Windows Live ID Namespace Provider") in all my processes.
>> This would be super-cool if true, but it doesn't work for me. . .
>>
>> If I try, I find
>>
>>   C:\Windows\system32\ntmarta.dll
>>   C:\Windows\SysWOW64\sechost.dll
>>   C:\Windows\syswow64\WLDAP32.dll
>>
>> in /proc/[ls procid]/maps but not in cygcheck output, but none of
>> those are BLODA, right?
>>
>> [Note also that maps shows many things in syswow64 which cygcheck
>> shows in system32, but presumably that's because cygcheck itself is a
>> 32-bit app, is it?]
>>
> Interesting...
> 
> $ join -i -v 1 <(cat /proc/$$/maps | sed 's;^.*/;;' | sort -f) <(cygcheck
> $(cat /proc/$$/winexename) | sed 's;^.*\\;;' | sort -f)
[list cut]
> 
> The above shows all dlls loaded by the process which are not linked in at
> compile time. Does bash really load so many dynamic libraries, or is cygcheck
> missing things?

system DLLs dyamically load other DLLs, both for extensibility and for
performance (delay-loading), so this list doesn't really tell you anything
interesting.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-03-15 15:04       ` Ryan Johnson
  2011-03-15 17:52         ` BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable) Henry S. Thompson
@ 2011-04-04 18:40         ` Jon TURNEY
  2011-04-13 22:21           ` Ryan Johnson
  1 sibling, 1 reply; 15+ messages in thread
From: Jon TURNEY @ 2011-04-04 18:40 UTC (permalink / raw)
  To: cygwin

On 15/03/2011 13:53, Ryan Johnson wrote:
> All of this assumes Windows is consistent in choosing locations when conflicts

It's assumed that CreateProcess() produces the same layout, yes.

> are involved. IOW, consider the case that B depends on A, with A and B both
> conflicting with a later-loaded C. The first time A and C load Windows will
> choose alternate locations for them, and if that order changes in the child,
> it's totally possible that A ends up in the child where C was in the parent.

I'm not sure what you mean here: all that matters is duplicating the layout at
the point in time when fork() occurs.

>>> Incidentally, this
>>> same problem would arise if a BLODA injected a DLL into the process -- that
>>> DLL would be on the todo list for fork() to process (because it was also
>>> injected into the parent process), but would already be loaded by the time we
>>> try to remap it. Also, if we do want to force Windows not to put a dll in a
>>> certain address, wouldn't it make more sense to reserve the (wrong) space it
>>> went into on the first try? Right now if the offending location is higher than
>>> the one we want, nothing stops Windows from just putting it right back in its
>>> old spot because the code only reserves locations lower than the desired one.
>>>
>>> Is this accurate or am I missing something here?
>> I'm not sure that particular scenario with injected DLLs is possible, as the
>> list traversed in dll_list::load_after_fork() is only of dynamically loaded
>> cygwin-based DLLs?
> Oh, so injected dlls, though not statically linked in, still wouldn't be on
> this list?
> 
> BTW, I found a good way to identify, if not fix, BLODA: given an app which
> loads no libraries at runtime -- such as 'ls' -- any dlls mentioned in
> /proc/$$/maps which cygcheck does not mention are probably dodgy. In my case,
> Windows Live (which I didn't think was even installed on my machine) has
> injected a WLIDNSP.DLL ("Microsoft Windows Live ID Namespace Provider") in all
> my processes.
> 
>> $ objdump -p /usr/bin/cygpyglib-2.0-python2.6-0.dll | grep ^ImageBase
>> ImageBase               6aa40000
>>
>> $ objdump -p /usr/bin/cygglib-2.0-0.dll | grep ^ImageBase
>> ImageBase               6aa40000
>>
>> C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
>>     1263 [main] python 3008 dll_list::load_after_fork: reserve_upto 0x18C40000
>> to try to force it to load there
>>     1473 [main] python 3008 dll_list::load_after_fork: LoadLibrary
>> C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
>>     1620 [main] python 3008 C:\cygwin\bin\python.exe: *** fatal error - unable
>> to remap C:\cygwin\bin\cygglib-2.0-0.dll to same address as parent: 0x18C40000
>> != 0x6AA40000
>>
>> and I've confirmed that in the parent, cygpyglib-2.0-python2.6-0.dll loads at
>> 0x6AA40000 and cygglib-2.0-0.dll loads at 0x18C40000.
>>
>> At a wild guess, it looks like LoadLibraryEx() maps DLLs into memory starting
>> from the top of the dependency chain, but then calls the DLL's entry point
>> starting from the bottom of the dependency chain (which makes all kinds of
>> sense, but leads to this inversion of the load order in the child)
>>
> So the problem basically arises because dlls in the child are not actually
> loaded in the same order as in the parent?  In this case I assume that
> cygpyglib depends on cygglib,

There's no need to assume it because I wrote "cygpyglib-2.0-python2.6-0.dll
(glib bindings for python) depends on cygglib-2.0-0.dll"

> which suggests that we could avoid a lot of
> trouble by handling dependent children first.

The problem in the particular case I've looked at.  I wouldn't assume that all
or even most remap failures are caused by that scenario.

> Also, it looks like the above is exactly the case I suspected -- the offending
> dll attempts to load *higher* than where we want it, so reserving space below
> does nothing for us.
>>> I assume there's a way to enumerate the dlls loaded in a given process; would
>>> it make sense to use a three-step algorithm?
>>> 1. Unload all currently-loaded dlls, complaining loudly to stderr or a log
>>> file (these are due to BLODA and deserve to be called out)
>>> 2. Load without deps every DLL and make sure it lands at the right address
>>> (using memory reservation tricks if needed)
>>> 3. Reload with deps every DLL. Presumably once it has landed correctly once it
>>> will do so thereafter (the current code assumes this, at least)
>> Doing 2&  3 is an interesting idea, the first call to let you pin it at a
>> particular address and the second to make it executable.
>>
>> I've no idea what happens, but unfortunately, the comments in
>> dll_list::load_after_fork() seem to suggest this doesn't work, as the DLLs
>> entry point doesn't get called the second time it's loaded.
> The code currently unloads the library completely and the reloads it normally,
> which I assumed was to ensure entry points get called.
> 
>>> In theory, the first step might allow cygwin to resist dll injection (maybe on
>>> an opt-out basis?), though I don't know what the consequences of that choice
>>> would be.
>>>
>>> The third step would be significantly easier if we had a dependency graph so
>>> that we could ensure dependencies always get processed before they're needed,
>>> but I don't know if that's feasible. How expensive/embeddable is cygcheck?
>> Another idea (assuming my guess about LoadLibrary() behaviour above is
>> correct) would be to have dlopen() rather than simply call LoadLibrary() on a
>> DLL, construct the dependency tree of the DLL it's been asked to open and load
>> the DLLs starting from the bottom, so that the order of loading into memory
>> matches the order which entry points are called (and hence the order in
>> dll_list)? (This would have the advantage of not making fork() even more
>> heavyweight)
> Some variant of objdump -p $THE_DLL  | grep 'DLL Name' ?
> 
> It might also make sense for the parent process to record some ordering
> information at dlopen time in case it forks later. Given that the dlls are
> opening anyway it would probably be cheap to do it then. Just build a tree of
> all dlls which the current dlopen() triggers dlopen() calls for. Alternatively
> (simpler?) just make dlopen() add dlls to its list just before it returns.
> That way, any recursive calls will add the dependencies to the list first. No
> special data structures needed. Only problem is, I can't see where in the
> source this magical list is generated in the first place :(

I suggest you read how-startup-shutdown-works.txt and then observe that
dll_list:alloc() is called by dll_dllcrt0_1()

>> Alternatively, maybe all that is needed is a slightly more complex approach to
>> forcing the DLL to load at a particular address?  If reserve_upto() has been
>> called, but it loads higher than that, can we assume load order inversion has
>> occurred, and try to to block it from loading at it's preferred address by
>> VirtualAlloc()-ing there as well? I think I might even try to write a patch to
>> do that...
> The second approach might be easier to hack together quickly, but the first
> would actually make fork() more efficient and eliminate a lot of code: it's
> likely that all the rebasing/remapping fallbacks could disappear.
> 
> A third alternative would be to traverse the remaining list of dlls and find
> the one that we should have loaded first. This would have to be recursive to
> handle the case where several dlls map to the same base, but might otherwise
> be workable.

I look forward to reading your patches :-)

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: Debugging help for fork failure: resource temporarily unavailable
  2011-04-04 18:40         ` Debugging help for fork failure: resource temporarily unavailable Jon TURNEY
@ 2011-04-13 22:21           ` Ryan Johnson
  2011-04-14  6:47             ` Ryan Johnson
  0 siblings, 1 reply; 15+ messages in thread
From: Ryan Johnson @ 2011-04-13 22:21 UTC (permalink / raw)
  To: Jon TURNEY; +Cc: cygwin

On 2:59 PM, Jon TURNEY wrote:
> On 15/03/2011 13:53, Ryan Johnson wrote:
>> All of this assumes Windows is consistent in choosing locations when conflicts
> It's assumed that CreateProcess() produces the same layout, yes.
This assumption is due to what?
- Documented Windows feature?
- An observation which holds in spite of tests/abuse which try to 
exercise corner cases?
- A wish which comes true often enough to get by in most cases?

> I suggest you read how-startup-shutdown-works.txt and then observe that
> dll_list:alloc() is called by dll_dllcrt0_1()
Thanks for the pointer. Unfortunately, it doesn't tell much about 
fork(), nor does it make clear when memory for dependent dlls is 
assigned. The loader calls dependent dllmains right before entering the 
. owner's dllmain, but that doesn't say when the address space is 
reserved. There's the complexity of deferred vs. immediate dllmain calls 
as well.

> I look forward to reading your patches :-)
I think it's still rather premature to be cooking up a patch, 
unfortunately -- I'm not convinced I know yet where the real problem 
lies. Without some data to back up my speculation (which seems hard to 
come by), any patch I might write would have a high probability of 
joining other accumulated band-aids such as reserve_upto().

Open questions (for my ignorant self, at least) include:
- Does Windows always load a given dll at the same address when its base 
address is already occupied?
- Does fork() always load DLLs in the same order that the parent loaded 
them? This would probably be helpful to know even in cases where no 
error arises, because it's a necessary precursor to fork failures, and 
the code seems to assume it's true.
- Is it ever possible for fork() to unload BLODA dlls?
- Do injected dlls arrive before or after statically-linked dlls? Or can 
it be either one?
- At fork time, does cygwin mogrify some generic child process to look 
like the parent, or is the child another "normal" run of the parent's 
executable image followed by plastic surgery to make heap, stack, etc. 
match? I had been assuming the former, but should probably ask.

If if statically-linked dlls ever need to be loaded manually by  fork(), 
and injected dlls arrive after statically-linked ones, then that might 
explain BLODA right there: If static.dll and bloda.dll both have the 
same base address, and dll injection occurs after statically-linked 
dlls, and fork() ever needs to load static.dll manually, then that would 
be (at least one) way bloda can arise.

Is there any way to have cygwin report dlls, base addresses, and 
windows-assigned addresses for dlls when a fork fails?

Thoughts?
Ryan


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: Re: Debugging help for fork failure: resource temporarily unavailable
  2011-04-13 22:21           ` Ryan Johnson
@ 2011-04-14  6:47             ` Ryan Johnson
  2011-04-14 18:21               ` Ryan Johnson
  0 siblings, 1 reply; 15+ messages in thread
From: Ryan Johnson @ 2011-04-14  6:47 UTC (permalink / raw)
  Cc: Jon TURNEY, cygwin

On 2:59 PM, Ryan Johnson wrote:
> On 2:59 PM, Jon TURNEY wrote:
>> I look forward to reading your patches :-)
> I think it's still rather premature to be cooking up a patch, 
> unfortunately -- I'm not convinced I know yet where the real problem 
> lies. Without some data to back up my speculation (which seems hard to 
> come by), any patch I might write would have a high probability of 
> joining other accumulated band-aids such as reserve_upto().
>
> Open questions (for my ignorant self, at least) include:
> - Does Windows always load a given dll at the same address when its 
> base address is already occupied?
> - Does fork() always load DLLs in the same order that the parent 
> loaded them? This would probably be helpful to know even in cases 
> where no error arises, because it's a necessary precursor to fork 
> failures, and the code seems to assume it's true.
> - Is it ever possible for fork() to unload BLODA dlls?
> - Do injected dlls arrive before or after statically-linked dlls? Or 
> can it be either one?
> - At fork time, does cygwin mogrify some generic child process to look 
> like the parent, or is the child another "normal" run of the parent's 
> executable image followed by plastic surgery to make heap, stack, etc. 
> match? I had been assuming the former, but should probably ask.

Update: I wrote a very simple program whose main() prints out the 
contents of /proc/self/maps, forks, calls foo() and bar(), and finally 
(if the parent) calls wait().

The trick is, foo() and bar() reside in cygfoo.dll and cygbar.dll 
respectively, which I compiled to have the same base address: 0x66000000.

The running binary often, but not always, results in those annoying 
"exception::handle: Exception: STATUS_ACCESS_VIOLATION" messages (the 
process otherwise appears to complete normally most of the time). 
However, once in a while the child fails to spawn, with no particular 
error message to advertise that fact.

Running inside gdb (inside a plain cygwin window) gives the following 
(I'm on Win7 x64, with all the latest packages as of yesterday afternoon):
> $ gdb
> GNU gdb 6.8.0.20080328-cvs (cygwin-special)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-pc-cygwin".
> (gdb) file fork
> Reading symbols from /home/Ryan/experiments/fork-tests/fork...done.
> (gdb) run
> Starting program: /home/Ryan/experiments/fork-tests/fork
> [New thread 8864.0x2120]
> Error: dll starting at 0x77190000 not found.
> Error: dll starting at 0x75650000 not found.
> Error: dll starting at 0x77190000 not found.
> Error: dll starting at 0x76d20000 not found.
> [New thread 8864.0x2710]
> + + + bar.cpp init
> + + + foo.cpp init
> + + + fork.cpp init
> 00400000-00410000 rw-s 00401000 2C36:17C8 33776997205430206   
> /home/Ryan/experiments/fork-tests/fork.exe
> 775E0000-77760000 r-xs 00000000 2C36:17C8 281474976927378     
> /cygdrive/c/Windows/SysWOW64/ntdll.dll
> 75650000-75760000 r-xs 756632D3 2C36:17C8 281474976927037     
> /cygdrive/c/Windows/syswow64/kernel32.dll
> 75350000-75396000 r-xs 75357478 2C36:17C8 281474976925120     
> /cygdrive/c/Windows/syswow64/KERNELBASE.dll
> 66000000-66012000 rw-s 660011F0 2C36:17C8 3940649674730545    
> /home/Ryan/experiments/fork-tests/cygbar.dll
> 61000000-61450000 r-xs 6106F960 2C36:17C8 844424930325032     
> /usr/bin/cygwin1.dll
> 75A60000-75B00000 r-xs 75A749E5 2C36:17C8 281474976927159     
> /cygdrive/c/Windows/syswow64/ADVAPI32.DLL
> 75050000-750FC000 rw-s 7505A472 2C36:17C8 281474976749314     
> /cygdrive/c/Windows/syswow64/msvcrt.dll
> 76840000-76859000 r-xs 76844975 2C36:17C8 281474976749841     
> /cygdrive/c/Windows/SysWOW64/sechost.dll
> 76750000-76840000 r-xs 76760569 2C36:17C8 281474976924963     
> /cygdrive/c/Windows/syswow64/RPCRT4.dll
> 74CD0000-74D30000 r-xs 74CEA3B3 2C36:17C8 281474976924512     
> /cygdrive/c/Windows/syswow64/SspiCli.dll
> 74CC0000-74CCC000 r-xp 74CC10E1 2C36:17C8 281474976748415     
> /cygdrive/c/Windows/syswow64/CRYPTBASE.dll
> 67F00000-67F0F000 rw-s 67F08920 2C36:17C8 562949954003711     
> /usr/bin/cyggcc_s-1.dll
> 6C480000-6C545000 rw-s 6C485110 2C36:17C8 562949954003739     
> /usr/bin/cygstdc++-6.dll
> 002B0000-002C2000 rw-p 002B11F0 2C36:17C8 2533274791177101    
> /home/Ryan/experiments/fork-tests/cygfoo.dll
> 753A0000-754A0000 rw-p 753BB6ED 2C36:17C8 281474976926904     
> /cygdrive/c/Windows/system32/user32.dll
> 74FC0000-75050000 rw-p 74FD6343 2C36:17C8 281474976926610     
> /cygdrive/c/Windows/syswow64/GDI32.dll
> 754D0000-754DA000 rw-p 754D36A0 2C36:17C8 281474976749103     
> /cygdrive/c/Windows/syswow64/LPK.dll
> 757D0000-7586D000 rw-p 75803FD7 2C36:17C8 281474976927082     
> /cygdrive/c/Windows/syswow64/USP10.dll
> 754E0000-75540000 r-xp 754F158F 2C36:17C8 281474976924115     
> /cygdrive/c/Windows/system32/IMM32.DLL
> 74EF0000-74FBC000 rw-p 74EF168B 2C36:17C8 281474976749206     
> /cygdrive/c/Windows/syswow64/MSCTF.dll
> 76980000-76985000 rw-p 76981438 2C36:17C8 281474976749672     
> /cygdrive/c/Windows/system32/psapi.dll
>
> Before fork
>       0 [main] fork 9472 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
>     559 [main] fork 9472 open_stackdumpfile: Dumping stack trace to 
> fork.exe.stackdump
>       0 [main] fork 9132 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
>     525 [main] fork 9132 open_stackdumpfile: Dumping stack trace to 
> fork.exe.stackdump
>       0 [main] fork 7812 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
>     531 [main] fork 7812 open_stackdumpfile: Dumping stack trace to 
> fork.exe.stackdump
>       0 [main] fork 7648 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
>     521 [main] fork 7648 open_stackdumpfile: Dumping stack trace to 
> fork.exe.stackdump
>       0 [main] fork 1960 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
>     657 [main] fork 1960 open_stackdumpfile: Dumping stack trace to 
> fork.exe.stackdump
>       0 [main] fork 4480 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
>     914 [main] fork 4480 open_stackdumpfile: Dumping stack trace to 
> fork.exe.stackdump
>       0 [main] fork 8864 fork: child -1 - died waiting for longjmp 
> before initialization, retry 0, exit code 0x600, errno 11
> Parent after fork (child: -1)
> Parent exiting
> * * * fork.cpp fini
> * * * foo.cpp fini
> * * * bar.cpp fini
>
> Program exited normally.

The above raises several interesting questions:

1. Why doesn't /proc/self/maps contain all the dlls gdb complains about? 
x7565 is kernel32.dll, but there's no sign of x7719 or x76d2. I tried 
nirsoft's 'InjectedDLL' but none of the dlls it finds have those bases, 
and windbg doesn't report them either.

2. What determines which of the many bad things can happen at fork() 
time? I've seen "resource temporarily unavailable", "died waiting for 
longjmp" , and now this "STATUS_ACCESS_VIOLATION" (which invariably 
happens an even number of times but is usually not fatal) ?

3. What code is raising the access violation, and is there a way to make 
gdb catch it?
> (gdb) catch load
> catch of library loads not yet implemented on this platform
> (gdb) catch throw
> Function "__cxa_throw" not defined.
> (gdb) catch exception
> Unable to insert catchpoint.  Is this an Ada main program?
> (gdb) catch signal SIGSEGV
> Catch of signal not yet implemented

4. Strace shows that each pair of access violations corresponds to a 
failed attempt at forking. I guess after three failures cygwin gives up 
and triggers the waiting-for-longjmp error?

Unfortunately I haven't been able to reproduce the resource unavailable 
flavor of error yet...

Thoughts?
Ryan


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Debugging help for fork failure: resource temporarily unavailable
  2011-04-14  6:47             ` Ryan Johnson
@ 2011-04-14 18:21               ` Ryan Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Ryan Johnson @ 2011-04-14 18:21 UTC (permalink / raw)
  To: Ryan Johnson; +Cc: Jon TURNEY, cygwin

On 2:59 PM, Ryan Johnson wrote:
> I wrote a very simple program whose main() prints out the contents of 
> /proc/self/maps, forks, calls foo() and bar(), and finally (if the 
> parent) calls wait().
>
> The trick is, foo() and bar() reside in cygfoo.dll and cygbar.dll 
> respectively, which I compiled to have the same base address: 0x66000000.
>
> The running binary often, but not always, results in those annoying 
> "exception::handle: Exception: STATUS_ACCESS_VIOLATION" messages (the 
> process otherwise appears to complete normally most of the time). 
> However, once in a while the child fails to spawn, with no particular 
> error message to advertise that fact.

After downloading windbg, I was able to capture the stack traces of 
access violations. It turns out there's no retry code at all; it's just 
some access violations cause the exception handler to crash with a 
variable number of additional violations.

At this point I'm officially in over my head. The output below is 
hopefully useful to somebody who knows a lot more than I do about the 
code which is involved here.

Help? Ideas?
Ryan


The first access violation occurred in la-la land:
WARNING: FrameIP not in any known module. Following frames may be wrong.
610203ea 0x5c0073
6111ccca cygwin1!dlerror+0x6aa
610050ab cygwin1!wordfree+0x4d4a
61007040 cygwin1!setprogname+0x36bb
61004c96 cygwin1!dll_crt0__FP11per_process+0x710
61004d3b cygwin1!setprogname+0x32a6
004013c2 cygwin1!setprogname+0x334b
00401015 image00000000_00400000+0x13c2
756633ca image00000000_00400000+0x1015
77619ed2 kernel32!BaseThreadInitThunk+0x12
77619ea5 ntdll32!RtlInitializeExceptionChain+0x63
00000000 ntdll32!RtlInitializeExceptionChain+0x36

Passing the above through gdb's disassembler:
0x5c0073
<_ZN10per_module9run_dtorsEv+24>:    call   *%eax
<exit+21>:   call   0x61144970 <__call_exitprocs>
<cygwin_exit+22>:    call   0x6111ccb0 <exit>
<_cygwin_exit_return+526>:   call   0x61005090 <cygwin_exit>
<cygwin_crt0+28>:    call   *0x405120
<mainCRTStartup+16>: call   0x4013a0 <cygwin_crt0>
kernel32!BaseThreadInitThunk+0x12
... ntdll calls ...

Continuing with windbg from there gave the following:
cYgSiGw00f 11 0x2030 0x28CE8CHEAP[fork.exe]: HEAP: Free Heap block 
2b3ac0 modified at 2b4000 after it was freed
ntdll!EtwpCreateEtwThread+0x1721

Continuing on again:
WARNING: Stack unwind information not available. Following frames may be 
wrong.
660010d9 cyggcc_s_1!_deregister_frame_info_bases+0x48
610203ea cygbar+0x10d9
6111ccca cygwin1!dlerror+0x6aa
77638fba ntdll32!RtlQueryEnvironmentVariable+0x241
77638e5c ntdll32!LdrShutdownProcess+0x141
75667a25 ntdll32!RtlExitUserProcess+0x74
6109d683 kernel32!ExitProcess+0x15
61005360 cygwin1!cygwin32_posix_path_list_p+0xa2f3
610283e1 cygwin1!setprogname+0x3970
61029034 cygwin1!getenv+0x4611
77626ab9 cygwin1!getenv+0x5264
77626a8b ntdll32!RtlDosSearchPath_Ustr+0xada
775f0143 ntdll32!RtlDosSearchPath_Ustr+0xaac
6111ccca ntdll32!KiUserExceptionDispatcher+0xf
610050ab cygwin1!wordfree+0x4d4a
61007040 cygwin1!setprogname+0x36bb
(rest continues as in the first stack trace)

Again decoding with gdb:
<__gcc_deregister_frame+48>: movl   $0x66003000,(%esp)
<_ZN10per_module9run_dtorsEv+24>:    call   *%eax
<exit+21>:   call   0x61144970 <__call_exitprocs>
... ntdll calls...
kernel32!ExitProcess+0x15
<_ZN5pinfo4exitEm+270>:      call   0x61159c90 <ExitProcess@4>
<_Z7do_exiti@4+587>: call   0x6109d570 <_ZN5pinfo4exitEm>
<_ZN7_cygtls11signal_exitEi+284>:    call   0x61005110 <_Z7do_exiti@4>
<_ZN9exception6handleEP17_EXCEPTION_RECORDP15_exception_listP8_CONTEXTPv+1295>:      
call   0x610282c0 <_ZN7_cygtls11signal_exitEi>
... ntdll calls ...
ntdll32!KiUserExceptionDispatcher+0xf
... below this is the stack trace from the original access violation ...

Continuing yet again gives a third access violation inside ntdll32, due 
to a second call to KiUserExceptionDispatcher:
WARNING: Stack unwind information not available. Following frames may be 
wrong.
0028c080 77626ab9 ntdll32!RtlUlonglongByteSwap+0x1605d
0028c0a4 77626a8b ntdll32!RtlDosSearchPath_Ustr+0xada
0028c154 775f0143 ntdll32!RtlDosSearchPath_Ustr+0xaac
0028c4bc 660010d9 ntdll32!KiUserExceptionDispatcher+0xf
0028c4cc 610203ea cygbar+0x10d9
... continues as before ...

At this point we're in an endless loop of access violations in the user 
exception dispatcher leading to the exception dispatcher raising an 
access violation... I figured this would lead to a stack overflow and 
detached.

Surprisingly, the output of the cygwin app contained no mention of 
access violations. However, the errors prevented the child process from 
running global destructors in its dlls:
Child exiting
* * * fork.cpp fini
Parent after fork (child: 7944)
Parent exiting
* * * fork.cpp fini
* * * foo.cpp fini
* * * bar.cpp fini


Rerunning the process a few times turned up a different access 
violation, this one definitely fork-related:
WARNING: Stack unwind information not available. Following frames may be 
wrong.
0028c7dc 61020d38 cygwin1!dlerror+0x74b
0028c7fc 61004c96 cygwin1!dlfork+0x808
0028ce64 00000000 cygwin1!setprogname+0x32a6

Decoded:
<_ZN3dll4initEv+27>: mov    %eax,(%edx)
<_Z13dll_dllcrt0_1Pv+307>:   call   0x61020470 <_ZN3dll4initEv>
<_ZN7_cygtls5call2EPFmPvS0_ES0_S0_+52>:      call   *%ebp

Continuing twice more:
cYgSiGw00f 11 0x20B4 0x28CE8CModLoad: 00000000`76980000 
00000000`76985000   C:\Windows\SysWOW64\psapi.dll
(1918.20b4): Invalid handle - code c0000008 (first chance)
ntdll!KiRaiseUserExceptionDispatcher+0x3a:
00000000`774512f7 8b8424c0000000  mov     eax,dword ptr [rsp+0C0h] 
ss:00000000`0008e310=c0000008
...
(1918.20b4): Invalid handle - code c0000008 (!!! second chance !!!)
ntdll32!ZwClose+0x12:
775ff9e2 83c404          add     esp,4

Sure enough, cygwin noticed this time:
       1 [main] fork 6424 exception::handle: Exception: 
STATUS_ACCESS_VIOLATION
    2997 [main] fork 6424 open_stackdumpfile: Dumping stack trace to 
fork.exe.stackdump
       1 [main] fork 9204 child_info::sync: wait failed, pid 6424, Win32 
error 1812
221888192 [main] fork 9204 fork: child -1 - died waiting for longjmp 
before initialization, retry 10, exit code 0x1000000, errno 11
Parent after fork (child: -1)
Parent exiting
* * * fork.cpp fini
* * * foo.cpp fini
* * * bar.cpp fini




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-04-14 14:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-05 22:17 Debugging help for fork failure: resource temporarily unavailable Ryan Johnson
2011-03-06 15:03 ` chm
2011-03-07 15:29 ` Ryan Johnson
2011-03-09 10:34 ` Corinna Vinschen
2011-03-09 17:04   ` Christopher Faylor
2011-03-09 17:53   ` Ryan Johnson
2011-03-12 20:57     ` Jon TURNEY
2011-03-15 15:04       ` Ryan Johnson
2011-03-15 17:52         ` BLODA detection (was Re: Debugging help for fork failure: resource temporarily unavailable) Henry S. Thompson
2011-03-16 19:55           ` Ryan Johnson
2011-04-04 14:52             ` Jon TURNEY
2011-04-04 18:40         ` Debugging help for fork failure: resource temporarily unavailable Jon TURNEY
2011-04-13 22:21           ` Ryan Johnson
2011-04-14  6:47             ` Ryan Johnson
2011-04-14 18:21               ` Ryan Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).