* Improvements to fork handling (2/5)
@ 2011-05-11 18:31 Ryan Johnson
2011-05-22 1:44 ` Christopher Faylor
2011-05-28 20:50 ` Problems with: " Christopher Faylor
0 siblings, 2 replies; 23+ messages in thread
From: Ryan Johnson @ 2011-05-11 18:31 UTC (permalink / raw)
To: cygwin-patches
[-- Attachment #1: Type: text/plain, Size: 776 bytes --]
Hi all,
This patch has the parent sort its dll list topologically by
dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
in dependencies automatically, and the latter would then not benefit
from the code which "encourages" them to land in the right places. The
dependency tracking is achieved using a simple class which allows to
introspect a mapped dll image and pull out the dependencies it lists.
The code currently rebuilds the dependency list at every fork rather
than attempt to update it properly as modules are loaded and unloaded.
Note that the topsort optimization affects only cygwin dlls, so any
windows dlls which are pulled in dynamically (directly or indirectly)
will still impose the usual risk of address space clobbers.
Ryan
[-- Attachment #2: fork-topsort.patch --]
[-- Type: text/plain, Size: 6272 bytes --]
diff --git a/dll_init.cc b/dll_init.cc
--- a/dll_init.cc
+++ b/dll_init.cc
@@ -116,6 +116,18 @@ dll_list::operator[] (const PWCHAR name)
return NULL;
}
+/* Look for a dll based on is short name only (no path) */
+dll *
+dll_list::find_by_modname (const PWCHAR name)
+{
+ dll *d = &start;
+ while ((d = d->next) != NULL)
+ if (!wcscasecmp (name, d->modname))
+ return d;
+
+ return NULL;
+}
+
#define RETRIES 1000
/* Allocate space for a dll struct. */
@@ -152,14 +164,13 @@ dll_list::alloc (HINSTANCE h, per_proces
d->handle = h;
d->has_dtors = true;
d->p = p;
+ d->ndeps = 0;
+ d->deps = NULL;
+ d->modname = wcsrchr (d->name, L'\\');
+ if (d->modname)
+ d->modname++;
d->type = type;
- if (end == NULL)
- end = &start; /* Point to "end" of dll chain. */
- end->next = d; /* Standard linked list stuff. */
- d->next = NULL;
- d->prev = end;
- end = d;
- tot++;
+ append (d);
if (type == DLL_LOAD)
loaded_dlls++;
}
@@ -168,6 +179,119 @@ dll_list::alloc (HINSTANCE h, per_proces
return d;
}
+void
+dll_list::append (dll* d)
+{
+ if (end == NULL)
+ end = &start; /* Point to "end" of dll chain. */
+ end->next = d; /* Standard linked list stuff. */
+ d->next = NULL;
+ d->prev = end;
+ end = d;
+ tot++;
+}
+
+void dll_list::populate_deps (dll* d)
+{
+ WCHAR wmodname[NT_MAX_PATH];
+ pefile* pef = (pefile*) d->handle;
+ PIMAGE_DATA_DIRECTORY dd = pef->idata_dir (IMAGE_DIRECTORY_ENTRY_IMPORT);
+ /* Annoyance: calling crealloc with a NULL pointer will use the
+ wrong heap and crash, so we have to replicate some code */
+ long maxdeps = 4;
+ d->deps = (dll**) cmalloc (HEAP_2_DLL, maxdeps*sizeof (dll*));
+ d->ndeps = 0;
+ for (PIMAGE_IMPORT_DESCRIPTOR id=
+ (PIMAGE_IMPORT_DESCRIPTOR) pef->rva (dd->VirtualAddress);
+ dd->Size && id->Name;
+ id++)
+ {
+ char* modname = pef->rva (id->Name);
+ sys_mbstowcs (wmodname, NT_MAX_PATH, modname);
+ if (dll* dep = find_by_modname (wmodname))
+ {
+ if (d->ndeps >= maxdeps)
+ {
+ maxdeps = 2*(1+maxdeps);
+ d->deps = (dll**) crealloc (d->deps, maxdeps*sizeof (dll*));
+ }
+ d->deps[d->ndeps++] = dep;
+ }
+ }
+
+ /* add one to differentiate no deps from unknown */
+ d->ndeps++;
+}
+
+
+void
+dll_list::topsort ()
+{
+ /* Anything to do? */
+ if (!end)
+ return;
+
+ /* make sure we have all the deps available */
+ dll* d = &start;
+ while ((d = d->next))
+ if (!d->ndeps)
+ populate_deps (d);
+
+ /* unlink head and tail pointers so the sort can rebuild the list */
+ d = start.next;
+ start.next = end = NULL;
+ topsort_visit (d, true);
+
+ /* clear node markings made by the sort */
+ d = &start;
+ while ((d = d->next))
+ {
+ debug_printf ("%W", d->modname);
+ for (int i=1; i < -d->ndeps; i++)
+ debug_printf ("-> %W", d->deps[i-1]->modname);
+
+ /* It would be really nice to be able to keep this information
+ around for next time, but we don't have an easy way to
+ invalidate cached dependencies when a module unloads. */
+ d->ndeps = 0;
+ cfree (d->deps);
+ d->deps = NULL;
+ }
+}
+
+/* A recursive in-place topological sort. The result is ordered so that
+ dependencies of a dll appear before it in the list.
+
+ NOTE: this algorithm is guaranteed to terminate with a "partial
+ order" of dlls but does not do anything smart about cycles: an
+ arbitrary dependent dll will necessarily appear first. Perhaps not
+ surprisingly, Windows ships several dlls containing dependency
+ cycles, including SspiCli/RPCRT4.dll and a lovely tangle involving
+ USP10/LPK/GDI32/USER32.dll). Fortunately, we don't care about
+ Windows DLLs here, and cygwin dlls should behave better */
+void
+dll_list::topsort_visit (dll* d, bool seek_tail)
+{
+ /* Recurse to the end of the dll chain, then visit nodes as we
+ unwind. We do this because once we start visiting nodes we can no
+ longer trust any _next_ pointers.
+
+ We "mark" visited nodes (to avoid revisiting them) by negating
+ ndeps (undone once the sort completes). */
+ if (seek_tail && d->next)
+ topsort_visit (d->next, true);
+
+ if (d->ndeps > 0)
+ {
+ d->ndeps = -d->ndeps;
+ for (long i=1; i < -d->ndeps; i++)
+ topsort_visit (d->deps[i-1], false);
+
+ append (d);
+ }
+}
+
+
dll *
dll_list::find (void *retaddr)
{
diff --git a/dll_init.h b/dll_init.h
--- a/dll_init.h
+++ b/dll_init.h
@@ -52,6 +52,9 @@ struct dll
int count;
bool has_dtors;
dll_type type;
+ long ndeps;
+ dll** deps;
+ PWCHAR modname;
WCHAR name[1];
void detach ();
int init ();
@@ -84,6 +87,13 @@ public:
void detach (void *);
void init ();
void load_after_fork (HANDLE);
+ dll *find_by_modname (const PWCHAR name);
+ void populate_all_deps ();
+ void populate_deps (dll* d);
+ void topsort ();
+ void topsort_visit (dll* d, bool goto_tail);
+ void append (dll* d);
+
dll *inext ()
{
while ((hold = hold->next))
@@ -109,6 +119,23 @@ public:
dll_list () { protect.init ("dll_list"); }
};
+/* References:
+ http://msdn.microsoft.com/en-us/windows/hardware/gg463125
+ http://msdn.microsoft.com/en-us/library/ms809762.aspx
+*/
+struct pefile {
+ IMAGE_DOS_HEADER dos_hdr;
+
+ char* rva (long offset) { return (char*) this + offset; }
+ PIMAGE_NT_HEADERS32 pe_hdr () { return (PIMAGE_NT_HEADERS32) rva (dos_hdr.e_lfanew); }
+ PIMAGE_OPTIONAL_HEADER32 optional_hdr () { return &pe_hdr ()->OptionalHeader; }
+ PIMAGE_DATA_DIRECTORY idata_dir (DWORD which)
+ {
+ PIMAGE_OPTIONAL_HEADER32 oh = optional_hdr ();
+ return (which < oh->NumberOfRvaAndSizes)? oh->DataDirectory + which : 0;
+ }
+};
+
extern dll_list dlls;
void dll_global_dtors ();
diff --git a/fork.cc b/fork.cc
--- a/fork.cc
+++ b/fork.cc
@@ -600,6 +600,12 @@ fork ()
the problem to be out of temporary TLS path buffers. */
tmp_pathbuf tp;
+ /* Put the dll list in topological dependency ordering, in
+ hopes that the child will have a better shot at loading dlls
+ properly if it only has to deal with one at a time.
+ */
+ dlls.topsort ();
+
if (!held_everything)
{
if (exit_state)
[-- Attachment #3: fork-topsort.changes --]
[-- Type: text/plain, Size: 967 bytes --]
* dll_init.cc (dll_list::find_by_modname): new function to search
the dll list for a module name only (no path).
(dll_list::alloc): Initialize newly-added members of struct dll.
(dll_list::append): new function to factor out the append
operation (used by dll_list::topsort).
(dll_list::populate_deps): new function to identify dll dependencies.
(dll_list::topsort): new function to sort the dll list
topologically by dependencies.
(dll_list::topsort_visit): new helper function for the above.
* dll_init.h (struct dll): added new class members for topsort.
(struct dll_list): declare topsort-related functions.
(struct pefile): allows simple introspection of dll images.
* fork.cc (fork): topsort the dll list before forking. When the
child loads each dll, all dependencies are already loaded and will
no longer risk being pulled in unexpectedly.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-11 18:31 Improvements to fork handling (2/5) Ryan Johnson
@ 2011-05-22 1:44 ` Christopher Faylor
2011-05-22 18:42 ` Ryan Johnson
2011-05-28 20:50 ` Problems with: " Christopher Faylor
1 sibling, 1 reply; 23+ messages in thread
From: Christopher Faylor @ 2011-05-22 1:44 UTC (permalink / raw)
To: cygwin-patches
On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>Hi all,
>
>This patch has the parent sort its dll list topologically by
>dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>in dependencies automatically, and the latter would then not benefit
>from the code which "encourages" them to land in the right places. The
>dependency tracking is achieved using a simple class which allows to
>introspect a mapped dll image and pull out the dependencies it lists.
>The code currently rebuilds the dependency list at every fork rather
>than attempt to update it properly as modules are loaded and unloaded.
>Note that the topsort optimization affects only cygwin dlls, so any
>windows dlls which are pulled in dynamically (directly or indirectly)
>will still impose the usual risk of address space clobbers.
This seems CPU and memory intensive during a time for which we already
know is very slow. Is the benefit really worth it? How much more robust
does it make forking?
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-22 1:44 ` Christopher Faylor
@ 2011-05-22 18:42 ` Ryan Johnson
2011-05-23 7:32 ` Corinna Vinschen
2011-05-24 16:15 ` Corinna Vinschen
0 siblings, 2 replies; 23+ messages in thread
From: Ryan Johnson @ 2011-05-22 18:42 UTC (permalink / raw)
To: cygwin-patches
On 21/05/2011 9:44 PM, Christopher Faylor wrote:
> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>> Hi all,
>>
>> This patch has the parent sort its dll list topologically by
>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>> in dependencies automatically, and the latter would then not benefit
> >from the code which "encourages" them to land in the right places. The
>> dependency tracking is achieved using a simple class which allows to
>> introspect a mapped dll image and pull out the dependencies it lists.
>> The code currently rebuilds the dependency list at every fork rather
>> than attempt to update it properly as modules are loaded and unloaded.
>> Note that the topsort optimization affects only cygwin dlls, so any
>> windows dlls which are pulled in dynamically (directly or indirectly)
>> will still impose the usual risk of address space clobbers.
> This seems CPU and memory intensive during a time for which we already
> know is very slow. Is the benefit really worth it? How much more robust
> does it make forking?
Topological sorting is O(n), so there's no asymptotic change in
performance. Looking up dependencies inside a dll is *very* cheap (2-3
pointer dereferences per dep), and all of this only happens for
dynamically-loaded dlls. Given the number of calls to
Virtual{Alloc,Query,Free} and LoadDynamicLibraryEx which we make, I
would be surprised if the topsort even registered. That said, it is
extra work and will slow down fork.
I have not been able to test how much it helps, but it should help with
the test case Jon Turney reported with python a while back [1]. In fact,
it was that example which made me aware of the potential need for a
topsort in the first place.
In theory, this should completely eliminate the case where us loading
one DLL pulls in dependencies automatically (= uncontrolled and at
Windows' whim). The problem would manifest as a DLL which "loads" in the
same wrong place repeatedly when given the choice, and for which we
would be unable to VirtualAlloc the offending spot (because the dll in
question has non-zero refcount even after we unload it, due to the
dll(s) that depend on it. The currently checked-in code would not detect
this case, because inability to VirtualAlloc just causes ReserveAt and
ReserveUpto to skip that spot silently.
[1] http://cygwin.com/ml/cygwin/2011-04/msg00054.html
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-22 18:42 ` Ryan Johnson
@ 2011-05-23 7:32 ` Corinna Vinschen
2011-05-24 11:28 ` Ryan Johnson
2011-05-24 20:19 ` Ryan Johnson
2011-05-24 16:15 ` Corinna Vinschen
1 sibling, 2 replies; 23+ messages in thread
From: Corinna Vinschen @ 2011-05-23 7:32 UTC (permalink / raw)
To: cygwin-patches
On May 22 14:42, Ryan Johnson wrote:
> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
> >On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
> >>Hi all,
> >>
> >>This patch has the parent sort its dll list topologically by
> >>dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
> >>in dependencies automatically, and the latter would then not benefit
> >>from the code which "encourages" them to land in the right places. The
> >>dependency tracking is achieved using a simple class which allows to
> >>introspect a mapped dll image and pull out the dependencies it lists.
> >>The code currently rebuilds the dependency list at every fork rather
> >>than attempt to update it properly as modules are loaded and unloaded.
> >>Note that the topsort optimization affects only cygwin dlls, so any
> >>windows dlls which are pulled in dynamically (directly or indirectly)
> >>will still impose the usual risk of address space clobbers.
> >This seems CPU and memory intensive during a time for which we already
> >know is very slow. Is the benefit really worth it? How much more robust
> >does it make forking?
> Topological sorting is O(n), so there's no asymptotic change in
> performance. Looking up dependencies inside a dll is *very* cheap
> (2-3 pointer dereferences per dep), and all of this only happens for
> dynamically-loaded dlls. Given the number of calls to
> Virtual{Alloc,Query,Free} and LoadDynamicLibraryEx which we make, I
> would be surprised if the topsort even registered. That said, it is
> extra work and will slow down fork.
>
> I have not been able to test how much it helps, but it should help
> with the test case Jon Turney reported with python a while back [1].
> In fact, it was that example which made me aware of the potential
> need for a topsort in the first place.
>
> In theory, this should completely eliminate the case where us
> loading one DLL pulls in dependencies automatically (= uncontrolled
> and at Windows' whim). The problem would manifest as a DLL which
> "loads" in the same wrong place repeatedly when given the choice,
> and for which we would be unable to VirtualAlloc the offending spot
> (because the dll in question has non-zero refcount even after we
> unload it, due to the dll(s) that depend on it.
There might be a way around this. It seems to be possible to tweak
the module list the PEB points to so that you can unload a library
even though it has dependencies. Then you can block the unwanted
space and call LoadLibrary again. See (*) for a discussion how
you can unload the exe itself to reload another one. Maybe that's
something we can look into as well. ObNote: Of course, if we
could influnce the address at which a DLL gets loaded right from the
start, it would be the preferrable solution.
Corinna
(*) http://www.blizzhackers.cc/viewtopic.php?p=4332690
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-23 7:32 ` Corinna Vinschen
@ 2011-05-24 11:28 ` Ryan Johnson
2011-05-24 13:06 ` Corinna Vinschen
2011-05-24 20:19 ` Ryan Johnson
1 sibling, 1 reply; 23+ messages in thread
From: Ryan Johnson @ 2011-05-24 11:28 UTC (permalink / raw)
To: cygwin-patches
On 23/05/2011 3:31 AM, Corinna Vinschen wrote:
> On May 22 14:42, Ryan Johnson wrote:
>> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>> Hi all,
>>>>
>>>> This patch has the parent sort its dll list topologically by
>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>>> in dependencies automatically, and the latter would then not benefit
>>> >from the code which "encourages" them to land in the right places. The
>>>> dependency tracking is achieved using a simple class which allows to
>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>> The code currently rebuilds the dependency list at every fork rather
>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>> will still impose the usual risk of address space clobbers.
>>> This seems CPU and memory intensive during a time for which we already
>>> know is very slow. Is the benefit really worth it? How much more robust
>>> does it make forking?
>> Topological sorting is O(n), so there's no asymptotic change in
>> performance. Looking up dependencies inside a dll is *very* cheap
>> (2-3 pointer dereferences per dep), and all of this only happens for
>> dynamically-loaded dlls. Given the number of calls to
>> Virtual{Alloc,Query,Free} and LoadDynamicLibraryEx which we make, I
>> would be surprised if the topsort even registered. That said, it is
>> extra work and will slow down fork.
>>
>> I have not been able to test how much it helps, but it should help
>> with the test case Jon Turney reported with python a while back [1].
>> In fact, it was that example which made me aware of the potential
>> need for a topsort in the first place.
>>
>> In theory, this should completely eliminate the case where us
>> loading one DLL pulls in dependencies automatically (= uncontrolled
>> and at Windows' whim). The problem would manifest as a DLL which
>> "loads" in the same wrong place repeatedly when given the choice,
>> and for which we would be unable to VirtualAlloc the offending spot
>> (because the dll in question has non-zero refcount even after we
>> unload it, due to the dll(s) that depend on it.
> There might be a way around this. It seems to be possible to tweak
> the module list the PEB points to so that you can unload a library
> even though it has dependencies. Then you can block the unwanted
> space and call LoadLibrary again. See (*) for a discussion how
> you can unload the exe itself to reload another one. Maybe that's
> something we can look into as well. ObNote: Of course, if we
> could influnce the address at which a DLL gets loaded right from the
> start, it would be the preferrable solution.
>
>
> Corinna
>
> (*) http://www.blizzhackers.cc/viewtopic.php?p=4332690
After poking around at (*) a bit, it looks like NtMapViewOfSection does
more than I would have expected wrt dll loading. However, when I tried
to fire up a toy program to test it, NtOpenSection always returns
C0000024, regardless of whether I pass OBJ_KERNEL_HANDLE to it.
Does anybody have experience with those two functions?
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-24 11:28 ` Ryan Johnson
@ 2011-05-24 13:06 ` Corinna Vinschen
2011-05-24 17:01 ` Ryan Johnson
0 siblings, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2011-05-24 13:06 UTC (permalink / raw)
To: cygwin-patches
On May 24 07:27, Ryan Johnson wrote:
> On 23/05/2011 3:31 AM, Corinna Vinschen wrote:
> >On May 22 14:42, Ryan Johnson wrote:
> >>In theory, this should completely eliminate the case where us
> >>loading one DLL pulls in dependencies automatically (= uncontrolled
> >>and at Windows' whim). The problem would manifest as a DLL which
> >>"loads" in the same wrong place repeatedly when given the choice,
> >>and for which we would be unable to VirtualAlloc the offending spot
> >>(because the dll in question has non-zero refcount even after we
> >>unload it, due to the dll(s) that depend on it.
> >There might be a way around this. It seems to be possible to tweak
> >the module list the PEB points to so that you can unload a library
> >even though it has dependencies. Then you can block the unwanted
> >space and call LoadLibrary again. See (*) for a discussion how
> >you can unload the exe itself to reload another one. Maybe that's
> >something we can look into as well. ObNote: Of course, if we
> >could influnce the address at which a DLL gets loaded right from the
> >start, it would be the preferrable solution.
> >
> >
> >Corinna
> >
> >(*) http://www.blizzhackers.cc/viewtopic.php?p=4332690
> After poking around at (*) a bit, it looks like NtMapViewOfSection
> does more than I would have expected wrt dll loading. However, when
> I tried to fire up a toy program to test it, NtOpenSection always
> returns C0000024, regardless of whether I pass OBJ_KERNEL_HANDLE to
> it.
>
> Does anybody have experience with those two functions?
The Cygwin source code, for instance, is using this function, see
fhandler_mem.cc.
C0000024 is STATUS_OBJECT_TYPE_MISMATCH. I didn't see the error before,
but I have a vague idea why you get it. Are you by any chance trying to
call NtOpenSection with the OBJECT_ATTRIBUTES set to the file? If so,
that's wrong. The OBJECT_ATTRIBUTES names the attributes of the section,
including its name if it has one. It does not specify the file you're
trying to map. What you're looking for is NtCreateSection. It has an
extra parameter to specify the handle to an open file.
Here's an off-the-top-of-my-head example how you use it. Assuming you
already have an open file handle "filehandle" and it's size "filesize":
NTSTATUS status;
HANDLE sectionhandle;
OBJECT_ATTRIBUTES attr;
LARGE_INTEGER sectionsize = { QuadPart: filesize };
InitializeObjectAttributes (&attr, NULL, OBJ_INHERIT, NULL, NULL);
status = NtCreateSection (§ionhandle, SECTION_ALL_ACCESS, &attr,
§ionsize, PAGE_EXECUTE_READ, SEC_IMAGE,
filehandle);
HTH,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-22 18:42 ` Ryan Johnson
2011-05-23 7:32 ` Corinna Vinschen
@ 2011-05-24 16:15 ` Corinna Vinschen
2011-05-24 16:47 ` Ryan Johnson
1 sibling, 1 reply; 23+ messages in thread
From: Corinna Vinschen @ 2011-05-24 16:15 UTC (permalink / raw)
To: cygwin-patches
On May 22 14:42, Ryan Johnson wrote:
> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
> >On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
> >>Hi all,
> >>
> >>This patch has the parent sort its dll list topologically by
> >>dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
> >>in dependencies automatically, and the latter would then not benefit
> >>from the code which "encourages" them to land in the right places. The
> >>dependency tracking is achieved using a simple class which allows to
> >>introspect a mapped dll image and pull out the dependencies it lists.
> >>The code currently rebuilds the dependency list at every fork rather
> >>than attempt to update it properly as modules are loaded and unloaded.
> >>Note that the topsort optimization affects only cygwin dlls, so any
> >>windows dlls which are pulled in dynamically (directly or indirectly)
> >>will still impose the usual risk of address space clobbers.
> >This seems CPU and memory intensive during a time for which we already
> >know is very slow. Is the benefit really worth it? How much more robust
> >does it make forking?
> Topological sorting is O(n), so there's no asymptotic change in
> performance. Looking up dependencies inside a dll is *very* cheap
Btw., isn't the resulting dependency list identical to the list
maintained in the PEB_LDR_DATA member InInitializationOrderModuleList?
Or, in other words, can't we just use the data which is already
available?
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-24 16:15 ` Corinna Vinschen
@ 2011-05-24 16:47 ` Ryan Johnson
2011-05-24 17:03 ` Christopher Faylor
0 siblings, 1 reply; 23+ messages in thread
From: Ryan Johnson @ 2011-05-24 16:47 UTC (permalink / raw)
To: cygwin-patches
On 24/05/2011 12:14 PM, Corinna Vinschen wrote:
> On May 22 14:42, Ryan Johnson wrote:
>> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>> Hi all,
>>>>
>>>> This patch has the parent sort its dll list topologically by
>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>>> in dependencies automatically, and the latter would then not benefit
>>> >from the code which "encourages" them to land in the right places. The
>>>> dependency tracking is achieved using a simple class which allows to
>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>> The code currently rebuilds the dependency list at every fork rather
>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>> will still impose the usual risk of address space clobbers.
>>> This seems CPU and memory intensive during a time for which we already
>>> know is very slow. Is the benefit really worth it? How much more robust
>>> does it make forking?
>> Topological sorting is O(n), so there's no asymptotic change in
>> performance. Looking up dependencies inside a dll is *very* cheap
> Btw., isn't the resulting dependency list identical to the list
> maintained in the PEB_LDR_DATA member InInitializationOrderModuleList?
> Or, in other words, can't we just use the data which is already
> available?
I read somewhere that dll initialization is not guaranteed to happen in
any particular order, and from what I've seen so far I believe it.
I think that's one reason (among many) why cygwin has to factor the
user's initialization routines out from normal dll init function: they
might call functions in other dlls which might not have been initialized
yet. From what I can tell, though, mapping of all dlls in a batch
completes before any initialization routines run.
Even assuming I'm wrong and dependency order === initialization order,
we'd still have to find a way to isolate those dlls which are both
cygwin-aware and dynamically loaded, because those are the only ones we
care about. Doing that would also be expensive because we'd be searching
the cygwin dll list for each dll in the PEB's list.
The best way to improve performance of this part of fork() would be to
figure out how to force a dll to load in the right place on the first
try. Achieving this admittedly "difficult" task would eliminate multiple
syscalls per dll, the aggregate cost of which dominates the topsort into
oblivion unless I'm very mistaken.
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-24 13:06 ` Corinna Vinschen
@ 2011-05-24 17:01 ` Ryan Johnson
0 siblings, 0 replies; 23+ messages in thread
From: Ryan Johnson @ 2011-05-24 17:01 UTC (permalink / raw)
To: cygwin-patches
On 24/05/2011 9:05 AM, Corinna Vinschen wrote:
> On May 24 07:27, Ryan Johnson wrote:
>> On 23/05/2011 3:31 AM, Corinna Vinschen wrote:
>>> On May 22 14:42, Ryan Johnson wrote:
>>>> In theory, this should completely eliminate the case where us
>>>> loading one DLL pulls in dependencies automatically (= uncontrolled
>>>> and at Windows' whim). The problem would manifest as a DLL which
>>>> "loads" in the same wrong place repeatedly when given the choice,
>>>> and for which we would be unable to VirtualAlloc the offending spot
>>>> (because the dll in question has non-zero refcount even after we
>>>> unload it, due to the dll(s) that depend on it.
>>> There might be a way around this. It seems to be possible to tweak
>>> the module list the PEB points to so that you can unload a library
>>> even though it has dependencies. Then you can block the unwanted
>>> space and call LoadLibrary again. See (*) for a discussion how
>>> you can unload the exe itself to reload another one. Maybe that's
>>> something we can look into as well. ObNote: Of course, if we
>>> could influnce the address at which a DLL gets loaded right from the
>>> start, it would be the preferrable solution.
>>>
>>>
>>> Corinna
>>>
>>> (*) http://www.blizzhackers.cc/viewtopic.php?p=4332690
>> After poking around at (*) a bit, it looks like NtMapViewOfSection
>> does more than I would have expected wrt dll loading. However, when
>> I tried to fire up a toy program to test it, NtOpenSection always
>> returns C0000024, regardless of whether I pass OBJ_KERNEL_HANDLE to
>> it.
>>
>> Does anybody have experience with those two functions?
> The Cygwin source code, for instance, is using this function, see
> fhandler_mem.cc.
>
> C0000024 is STATUS_OBJECT_TYPE_MISMATCH. I didn't see the error before,
> but I have a vague idea why you get it. Are you by any chance trying to
> call NtOpenSection with the OBJECT_ATTRIBUTES set to the file? If so,
> that's wrong. The OBJECT_ATTRIBUTES names the attributes of the section,
> including its name if it has one. It does not specify the file you're
> trying to map. What you're looking for is NtCreateSection. It has an
> extra parameter to specify the handle to an open file.
Bingo. That was indeed it.
So, playing with NtMapViewOfSection (code below) shows that it knows how
to map a .dll properly (with appropriate permissions for the various
sections), and you can force it to map to a given address, but the
resulting image is not considered a proper .dll -- it doesn't show up in
the dll list and initialization routines don't run (I haven't checked
dependency loading yet). Most likely, this is roughly equivalent to
LoadLibraryEx+DONT_RESOLVE_DLL_REFERENCES.
Another issue arises if the file is mapped to somewhere other than its
preferred image base. In this case the mapping succeeds but returns
STATUS_IMAGE_NOT_AT_BASE, in which case LdrRelocateImage and its friends
need to be called.
An additional difficulty: trying to CreateFile a well-known dll like
psapi returns STATUS_WAIT_3, so file locking seems to be an issue. This
happens even if I pass all possible sharing flags (0x7).
Overall, it looks like going this route would strongly resemble the
custom loader option. There Be Dragons...
Meanwhile, the forced unload of the .exe is really tantalizing, but it
sounds there's a lot of messiness trying to clean up the mess and load
the new image. Granted, this would be simpler since we're trying to
reload the same image (after loading its dependent dlls properly), and
we already know what dlls need to be loaded by looking at the parent
process. However, we'd have to do this without unloading cygwin1.dll
(maybe calling LoadLibrary on it would bump the refcount and protect it?).
On the other hand, if we could get that forced unload to work reliably,
we'd almost have a true exec() on our hands and could use that to
implement fork: run a stub process, load the dlls we know we'll need,
then replace the stub with the actual image we want to run and clean up
the mess we left in the PEB... that would let us control nearly all
dlls, even windows ones.
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-24 16:47 ` Ryan Johnson
@ 2011-05-24 17:03 ` Christopher Faylor
0 siblings, 0 replies; 23+ messages in thread
From: Christopher Faylor @ 2011-05-24 17:03 UTC (permalink / raw)
To: cygwin-patches
On Tue, May 24, 2011 at 12:47:36PM -0400, Ryan Johnson wrote:
>The best way to improve performance of this part of fork() would be to
>figure out how to force a dll to load in the right place on the first
>try. Achieving this admittedly "difficult" task would eliminate multiple
>syscalls per dll, the aggregate cost of which dominates the topsort into
>oblivion unless I'm very mistaken.
Right. And, if we could cure cancer no one would suffer from cancer
anymore.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Improvements to fork handling (2/5)
2011-05-23 7:32 ` Corinna Vinschen
2011-05-24 11:28 ` Ryan Johnson
@ 2011-05-24 20:19 ` Ryan Johnson
1 sibling, 0 replies; 23+ messages in thread
From: Ryan Johnson @ 2011-05-24 20:19 UTC (permalink / raw)
To: cygwin-patches
On 23/05/2011 3:31 AM, Corinna Vinschen wrote:
> On May 22 14:42, Ryan Johnson wrote:
>> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>> Hi all,
>>>>
>>>> This patch has the parent sort its dll list topologically by
>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>>> in dependencies automatically, and the latter would then not benefit
>>> >from the code which "encourages" them to land in the right places. The
>>>> dependency tracking is achieved using a simple class which allows to
>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>> The code currently rebuilds the dependency list at every fork rather
>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>> will still impose the usual risk of address space clobbers.
>>> This seems CPU and memory intensive during a time for which we already
>>> know is very slow. Is the benefit really worth it? How much more robust
>>> does it make forking?
>> Topological sorting is O(n), so there's no asymptotic change in
>> performance. Looking up dependencies inside a dll is *very* cheap
>> (2-3 pointer dereferences per dep), and all of this only happens for
>> dynamically-loaded dlls. Given the number of calls to
>> Virtual{Alloc,Query,Free} and LoadDynamicLibraryEx which we make, I
>> would be surprised if the topsort even registered. That said, it is
>> extra work and will slow down fork.
>>
>> I have not been able to test how much it helps, but it should help
>> with the test case Jon Turney reported with python a while back [1].
>> In fact, it was that example which made me aware of the potential
>> need for a topsort in the first place.
>>
>> In theory, this should completely eliminate the case where us
>> loading one DLL pulls in dependencies automatically (= uncontrolled
>> and at Windows' whim). The problem would manifest as a DLL which
>> "loads" in the same wrong place repeatedly when given the choice,
>> and for which we would be unable to VirtualAlloc the offending spot
>> (because the dll in question has non-zero refcount even after we
>> unload it, due to the dll(s) that depend on it.
> There might be a way around this. It seems to be possible to tweak
> the module list the PEB points to so that you can unload a library
> even though it has dependencies. Then you can block the unwanted
> space and call LoadLibrary again. See (*) for a discussion how
> you can unload the exe itself to reload another one. Maybe that's
> something we can look into as well. ObNote: Of course, if we
> could influnce the address at which a DLL gets loaded right from the
> start, it would be the preferrable solution.
I tested that approach (LoadCount == Reserved5[1] and Flags ==
Reserved5[0] in struct _LDR_DATA_TABLE_ENTRY) and while it lets us
unload statically-linked .dlls it doesn't unload the .exe any more --
setting Flags=4 as recommended has no effect on my w7-x64 machine, nor
does setting Flags=0x80080004 to match other dlls' flags.
In retrospect, I don't think unloading dlls is going to be very helpful
if it leaves the .exe with thunks pointing to stale addresses (and
there's still the business of reloading the .exe afterward). I guess if
none of the thunks had been triggered yet, we might be able to get away
with it, but that sounds risky. We might try copying .idata across from
the parent, but that would clobber any thunks to windows dlls which
changed locations.
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Problems with: Improvements to fork handling (2/5)
2011-05-11 18:31 Improvements to fork handling (2/5) Ryan Johnson
2011-05-22 1:44 ` Christopher Faylor
@ 2011-05-28 20:50 ` Christopher Faylor
2011-05-28 22:40 ` Ryan Johnson
1 sibling, 1 reply; 23+ messages in thread
From: Christopher Faylor @ 2011-05-28 20:50 UTC (permalink / raw)
To: cygwin-patches, Ryan Johnson
On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>This patch has the parent sort its dll list topologically by
>dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>in dependencies automatically, and the latter would then not benefit
>from the code which "encourages" them to land in the right places. The
>dependency tracking is achieved using a simple class which allows to
>introspect a mapped dll image and pull out the dependencies it lists.
>The code currently rebuilds the dependency list at every fork rather
>than attempt to update it properly as modules are loaded and unloaded.
>Note that the topsort optimization affects only cygwin dlls, so any
>windows dlls which are pulled in dynamically (directly or indirectly)
>will still impose the usual risk of address space clobbers.
Bad news.
I applied this patch and the one after it but then noticed that zsh started
producing: "bad address: " errors.
path:4: bad address: /share/bin/dopath
term:1: bad address: /bin/tee
The errors disappear when I back this patch out.
FWIW, I was running "zsh -l". I have somewhat complicated
.zshrc/.zlogin/.zshenv files. I'll post them if needed.
Until this is fixed, this patch and the subsequent ones which rely on
it, can't go in. I did commit this fix but it has been backed out now.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Problems with: Improvements to fork handling (2/5)
2011-05-28 20:50 ` Problems with: " Christopher Faylor
@ 2011-05-28 22:40 ` Ryan Johnson
2011-05-29 0:23 ` Christopher Faylor
0 siblings, 1 reply; 23+ messages in thread
From: Ryan Johnson @ 2011-05-28 22:40 UTC (permalink / raw)
To: cygwin-patches
On 28/05/2011 4:50 PM, Christopher Faylor wrote:
> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>> This patch has the parent sort its dll list topologically by
>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>> in dependencies automatically, and the latter would then not benefit
> >from the code which "encourages" them to land in the right places. The
>> dependency tracking is achieved using a simple class which allows to
>> introspect a mapped dll image and pull out the dependencies it lists.
>> The code currently rebuilds the dependency list at every fork rather
>> than attempt to update it properly as modules are loaded and unloaded.
>> Note that the topsort optimization affects only cygwin dlls, so any
>> windows dlls which are pulled in dynamically (directly or indirectly)
>> will still impose the usual risk of address space clobbers.
> Bad news.
>
> I applied this patch and the one after it but then noticed that zsh started
> producing: "bad address: " errors.
>
> path:4: bad address: /share/bin/dopath
> term:1: bad address: /bin/tee
>
> The errors disappear when I back this patch out.
>
> FWIW, I was running "zsh -l". I have somewhat complicated
> .zshrc/.zlogin/.zshenv files. I'll post them if needed.
>
> Until this is fixed, this patch and the subsequent ones which rely on
> it, can't go in. I did commit this fix but it has been backed out now.
Hmm. I also see bad address errors in bash sometimes. However, when I
searched through the cygwin mailing list archives I saw that other
people have reported this problem in the past [1], so I figured it was
some existing, sporadic issue rather than my patch...
Could you tell me what a 'bad address' error is? I'd be happy to debug
this, but really don't know what kind of bug I'm hunting here, except
that it might be a problem wow64 and suspending threads [2]. Whatever
became of these bad address errors the last time(s) they cropped up? I
can't find any resolution with Google, at least.
[1] http://cygwin.com/ml/cygwin/2011-02/msg00215.html
[2]
http://zachsaw.blogspot.com/2010/11/wow64-bug-getthreadcontext-may-return.html
Thoughts?
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Problems with: Improvements to fork handling (2/5)
2011-05-28 22:40 ` Ryan Johnson
@ 2011-05-29 0:23 ` Christopher Faylor
2011-05-29 2:36 ` Ryan Johnson
0 siblings, 1 reply; 23+ messages in thread
From: Christopher Faylor @ 2011-05-29 0:23 UTC (permalink / raw)
To: cygwin-patches
On Sat, May 28, 2011 at 06:40:30PM -0400, Ryan Johnson wrote:
>On 28/05/2011 4:50 PM, Christopher Faylor wrote:
>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>> This patch has the parent sort its dll list topologically by
>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>> in dependencies automatically, and the latter would then not benefit
>> >from the code which "encourages" them to land in the right places. The
>>> dependency tracking is achieved using a simple class which allows to
>>> introspect a mapped dll image and pull out the dependencies it lists.
>>> The code currently rebuilds the dependency list at every fork rather
>>> than attempt to update it properly as modules are loaded and unloaded.
>>> Note that the topsort optimization affects only cygwin dlls, so any
>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>> will still impose the usual risk of address space clobbers.
>> Bad news.
>>
>> I applied this patch and the one after it but then noticed that zsh started
>> producing: "bad address: " errors.
>>
>> path:4: bad address: /share/bin/dopath
>> term:1: bad address: /bin/tee
>>
>> The errors disappear when I back this patch out.
>>
>> FWIW, I was running "zsh -l". I have somewhat complicated
>> .zshrc/.zlogin/.zshenv files. I'll post them if needed.
>>
>> Until this is fixed, this patch and the subsequent ones which rely on
>> it, can't go in. I did commit this fix but it has been backed out now.
>Hmm. I also see bad address errors in bash sometimes. However, when I
>searched through the cygwin mailing list archives I saw that other
>people have reported this problem in the past [1], so I figured it was
>some existing, sporadic issue rather than my patch...
>
>Could you tell me what a 'bad address' error is? I'd be happy to debug
>this, but really don't know what kind of bug I'm hunting here, except
>that it might be a problem wow64 and suspending threads [2]. Whatever
>became of these bad address errors the last time(s) they cropped up? I
>can't find any resolution with Google, at least.
If I had any insight beyond "It works without the patch and it doesn't
work with it" I would have shared it.
>[1] http://cygwin.com/ml/cygwin/2011-02/msg00215.html
>[2]
>http://zachsaw.blogspot.com/2010/11/wow64-bug-getthreadcontext-may-return.html
>
>Thoughts?
Pulling random messages out of the mailing list with the term "bad
address" in them is not going to fix the problem. Given all of the
stuff you're doing with memory, it makes sense to me that your code has
a problem rather than this is a coincidental manifestation of a problem
reported three months ago.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Problems with: Improvements to fork handling (2/5)
2011-05-29 0:23 ` Christopher Faylor
@ 2011-05-29 2:36 ` Ryan Johnson
2011-05-29 4:38 ` Daniel Colascione
0 siblings, 1 reply; 23+ messages in thread
From: Ryan Johnson @ 2011-05-29 2:36 UTC (permalink / raw)
To: cygwin-patches
On 28/05/2011 8:23 PM, Christopher Faylor wrote:
> On Sat, May 28, 2011 at 06:40:30PM -0400, Ryan Johnson wrote:
>> On 28/05/2011 4:50 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>> This patch has the parent sort its dll list topologically by
>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>>> in dependencies automatically, and the latter would then not benefit
>>> >from the code which "encourages" them to land in the right places. The
>>>> dependency tracking is achieved using a simple class which allows to
>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>> The code currently rebuilds the dependency list at every fork rather
>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>> will still impose the usual risk of address space clobbers.
>>> Bad news.
>>>
>>> I applied this patch and the one after it but then noticed that zsh started
>>> producing: "bad address: " errors.
>>>
>>> path:4: bad address: /share/bin/dopath
>>> term:1: bad address: /bin/tee
>>>
>>> The errors disappear when I back this patch out.
>>>
>>> FWIW, I was running "zsh -l". I have somewhat complicated
>>> .zshrc/.zlogin/.zshenv files. I'll post them if needed.
>>>
>>> Until this is fixed, this patch and the subsequent ones which rely on
>>> it, can't go in. I did commit this fix but it has been backed out now.
>> Hmm. I also see bad address errors in bash sometimes. However, when I
>> searched through the cygwin mailing list archives I saw that other
>> people have reported this problem in the past [1], so I figured it was
>> some existing, sporadic issue rather than my patch...
>>
>> Could you tell me what a 'bad address' error is? I'd be happy to debug
>> this, but really don't know what kind of bug I'm hunting here, except
>> that it might be a problem wow64 and suspending threads [2]. Whatever
>> became of these bad address errors the last time(s) they cropped up? I
>> can't find any resolution with Google, at least.
> If I had any insight beyond "It works without the patch and it doesn't
> work with it" I would have shared it.
Let me rephrase a bit... The error happens too early in fork for gdb to
be any help, and I was hoping you could tell me what part(s) of cygwin
are capable of "raising" this error -- it seems to be a linux (not
Windows) flavor of error message, but the case-insensitive regexp
'bad.address' does not match anything in the cygwin sources.
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Problems with: Improvements to fork handling (2/5)
2011-05-29 2:36 ` Ryan Johnson
@ 2011-05-29 4:38 ` Daniel Colascione
2011-05-29 8:00 ` Christopher Faylor
2011-05-29 8:42 ` Bug in cmalloc? Was: " Ryan Johnson
0 siblings, 2 replies; 23+ messages in thread
From: Daniel Colascione @ 2011-05-29 4:38 UTC (permalink / raw)
To: cygwin-patches
[-- Attachment #1: Type: text/plain, Size: 3371 bytes --]
On 5/28/11 7:35 PM, Ryan Johnson wrote:
> On 28/05/2011 8:23 PM, Christopher Faylor wrote:
>> On Sat, May 28, 2011 at 06:40:30PM -0400, Ryan Johnson wrote:
>>> On 28/05/2011 4:50 PM, Christopher Faylor wrote:
>>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>>> This patch has the parent sort its dll list topologically by
>>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked
>>>>> pulling
>>>>> in dependencies automatically, and the latter would then not benefit
>>>> >from the code which "encourages" them to land in the right places. The
>>>>> dependency tracking is achieved using a simple class which allows to
>>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>>> The code currently rebuilds the dependency list at every fork rather
>>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>>> will still impose the usual risk of address space clobbers.
>>>> Bad news.
>>>>
>>>> I applied this patch and the one after it but then noticed that zsh
>>>> started
>>>> producing: "bad address: " errors.
>>>>
>>>> path:4: bad address: /share/bin/dopath
>>>> term:1: bad address: /bin/tee
>>>>
>>>> The errors disappear when I back this patch out.
>>>>
>>>> FWIW, I was running "zsh -l". I have somewhat complicated
>>>> .zshrc/.zlogin/.zshenv files. I'll post them if needed.
>>>>
>>>> Until this is fixed, this patch and the subsequent ones which rely on
>>>> it, can't go in. I did commit this fix but it has been backed out now.
>>> Hmm. I also see bad address errors in bash sometimes. However, when I
>>> searched through the cygwin mailing list archives I saw that other
>>> people have reported this problem in the past [1], so I figured it was
>>> some existing, sporadic issue rather than my patch...
>>>
>>> Could you tell me what a 'bad address' error is? I'd be happy to debug
>>> this, but really don't know what kind of bug I'm hunting here, except
>>> that it might be a problem wow64 and suspending threads [2]. Whatever
>>> became of these bad address errors the last time(s) they cropped up? I
>>> can't find any resolution with Google, at least.
>> If I had any insight beyond "It works without the patch and it doesn't
>> work with it" I would have shared it.
> Let me rephrase a bit... The error happens too early in fork for gdb to
> be any help, and I was hoping you could tell me what part(s) of cygwin
> are capable of "raising" this error -- it seems to be a linux (not
> Windows) flavor of error message, but the case-insensitive regexp
> 'bad.address' does not match anything in the cygwin sources.
The actual string is in strerror.c in newlib, which is why you didn't
find it with a grep on winsup/cygwin. The error code is EFAULT. There
are 127 places in the cygwin1.dll where EFAULT can raised, according to
grep '\bEFAULT\b' *.cc. In most of them, EFAULT is raised after
something called san has detected that to Windows raised an unexpected
structured exception. My hunch would be to look in spawn.cc first, in
spawn_guts, but I haven't read your patch in enough detail to narrow the
problem down further. strace might be handy.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Problems with: Improvements to fork handling (2/5)
2011-05-29 4:38 ` Daniel Colascione
@ 2011-05-29 8:00 ` Christopher Faylor
2011-05-29 8:42 ` Bug in cmalloc? Was: " Ryan Johnson
1 sibling, 0 replies; 23+ messages in thread
From: Christopher Faylor @ 2011-05-29 8:00 UTC (permalink / raw)
To: cygwin-patches
On Sat, May 28, 2011 at 09:37:49PM -0700, Daniel Colascione wrote:
>On 5/28/11 7:35 PM, Ryan Johnson wrote:
>> On 28/05/2011 8:23 PM, Christopher Faylor wrote:
>>> On Sat, May 28, 2011 at 06:40:30PM -0400, Ryan Johnson wrote:
>>>> On 28/05/2011 4:50 PM, Christopher Faylor wrote:
>>>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>>>> This patch has the parent sort its dll list topologically by
>>>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked
>>>>>> pulling
>>>>>> in dependencies automatically, and the latter would then not benefit
>>>>> >from the code which "encourages" them to land in the right places. The
>>>>>> dependency tracking is achieved using a simple class which allows to
>>>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>>>> The code currently rebuilds the dependency list at every fork rather
>>>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>>>> will still impose the usual risk of address space clobbers.
>>>>> Bad news.
>>>>>
>>>>> I applied this patch and the one after it but then noticed that zsh
>>>>> started
>>>>> producing: "bad address: " errors.
>>>>>
>>>>> path:4: bad address: /share/bin/dopath
>>>>> term:1: bad address: /bin/tee
>>>>>
>>>>> The errors disappear when I back this patch out.
>>>>>
>>>>> FWIW, I was running "zsh -l". I have somewhat complicated
>>>>> .zshrc/.zlogin/.zshenv files. I'll post them if needed.
>>>>>
>>>>> Until this is fixed, this patch and the subsequent ones which rely on
>>>>> it, can't go in. I did commit this fix but it has been backed out now.
>>>> Hmm. I also see bad address errors in bash sometimes. However, when I
>>>> searched through the cygwin mailing list archives I saw that other
>>>> people have reported this problem in the past [1], so I figured it was
>>>> some existing, sporadic issue rather than my patch...
>>>>
>>>> Could you tell me what a 'bad address' error is? I'd be happy to debug
>>>> this, but really don't know what kind of bug I'm hunting here, except
>>>> that it might be a problem wow64 and suspending threads [2]. Whatever
>>>> became of these bad address errors the last time(s) they cropped up? I
>>>> can't find any resolution with Google, at least.
>>> If I had any insight beyond "It works without the patch and it doesn't
>>> work with it" I would have shared it.
>
>> Let me rephrase a bit... The error happens too early in fork for gdb to
>> be any help, and I was hoping you could tell me what part(s) of cygwin
>> are capable of "raising" this error -- it seems to be a linux (not
>> Windows) flavor of error message, but the case-insensitive regexp
>> 'bad.address' does not match anything in the cygwin sources.
>
>The actual string is in strerror.c in newlib, which is why you didn't
>find it with a grep on winsup/cygwin.
It doesn't come from newlib. If you grep "Bad address" in
winsup/cygwin/* you will see where it comes from.
I don't know why grep -i 'bad.address' didn't find anything but it
really is there.
>The error code is EFAULT. There are 127 places in the cygwin1.dll
>where EFAULT can raised, according to grep '\bEFAULT\b' *.cc. In most
>of them, EFAULT is raised after something called san has detected that
>to Windows raised an unexpected structured exception. My hunch would
>be to look in spawn.cc first, in spawn_guts, but I haven't read your
>patch in enough detail to narrow the problem down further. strace
>might be handy.
spawn.cc wasn't touched by the patch but I suppose something in the
modified fork code could affect a subsequent exec.
Anyway, I don't see any indication that the problem is too early for the
CYGWIN_DEBUG= trick to work. See how-to-debug-cygwin.txt.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Bug in cmalloc? Was: Re: Problems with: Improvements to fork handling (2/5)
2011-05-29 4:38 ` Daniel Colascione
2011-05-29 8:00 ` Christopher Faylor
@ 2011-05-29 8:42 ` Ryan Johnson
2011-05-29 16:28 ` Christopher Faylor
1 sibling, 1 reply; 23+ messages in thread
From: Ryan Johnson @ 2011-05-29 8:42 UTC (permalink / raw)
To: cygwin-patches
On 29/05/2011 12:37 AM, Daniel Colascione wrote:
> On 5/28/11 7:35 PM, Ryan Johnson wrote:
>> On 28/05/2011 8:23 PM, Christopher Faylor wrote:
>>> On Sat, May 28, 2011 at 06:40:30PM -0400, Ryan Johnson wrote:
>>>> On 28/05/2011 4:50 PM, Christopher Faylor wrote:
>>>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>>>> This patch has the parent sort its dll list topologically by
>>>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked
>>>>>> pulling
>>>>>> in dependencies automatically, and the latter would then not benefit
>>>>> >from the code which "encourages" them to land in the right places. The
>>>>>> dependency tracking is achieved using a simple class which allows to
>>>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>>>> The code currently rebuilds the dependency list at every fork rather
>>>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>>>> will still impose the usual risk of address space clobbers.
>>>>> Bad news.
>>>>>
>>>>> I applied this patch and the one after it but then noticed that zsh
>>>>> started
>>>>> producing: "bad address: " errors.
>>>>>
>>>>> path:4: bad address: /share/bin/dopath
>>>>> term:1: bad address: /bin/tee
>>>>>
>>>>> The errors disappear when I back this patch out.
>>>>>
>>>>> FWIW, I was running "zsh -l". I have somewhat complicated
>>>>> .zshrc/.zlogin/.zshenv files. I'll post them if needed.
>>>>>
>>>>> Until this is fixed, this patch and the subsequent ones which rely on
>>>>> it, can't go in. I did commit this fix but it has been backed out now.
>>>> Hmm. I also see bad address errors in bash sometimes. However, when I
>>>> searched through the cygwin mailing list archives I saw that other
>>>> people have reported this problem in the past [1], so I figured it was
>>>> some existing, sporadic issue rather than my patch...
>>>>
>>>> Could you tell me what a 'bad address' error is? I'd be happy to debug
>>>> this, but really don't know what kind of bug I'm hunting here, except
>>>> that it might be a problem wow64 and suspending threads [2]. Whatever
>>>> became of these bad address errors the last time(s) they cropped up? I
>>>> can't find any resolution with Google, at least.
>>> If I had any insight beyond "It works without the patch and it doesn't
>>> work with it" I would have shared it.
>> Let me rephrase a bit... The error happens too early in fork for gdb to
>> be any help, and I was hoping you could tell me what part(s) of cygwin
>> are capable of "raising" this error -- it seems to be a linux (not
>> Windows) flavor of error message, but the case-insensitive regexp
>> 'bad.address' does not match anything in the cygwin sources.
> The actual string is in strerror.c in newlib, which is why you didn't
> find it with a grep on winsup/cygwin. The error code is EFAULT. There
> are 127 places in the cygwin1.dll where EFAULT can raised, according to
> grep '\bEFAULT\b' *.cc. In most of them, EFAULT is raised after
> something called san has detected that to Windows raised an unexpected
> structured exception. My hunch would be to look in spawn.cc first, in
> spawn_guts, but I haven't read your patch in enough detail to narrow the
> problem down further. strace might be handy.
Wow. You nailed it. I fired up windbg and had it follow a cmd.exe into
bash --login and descendants, and sure enough, there was an access
violation when spawn_guts called build_env and the latter called cmalloc.
Anyway, after a long and twisty debug session, I have reason to believe
that either cmalloc/cfree is buggy, or the bug resides somewhere else
entirely and my patch just tickles it: freeing memory in the reverse
order it was allocated in causes Bad Things (tm) to happen, even on an
unpatched version of my 12 May CVS snapshot (which predates my fork
patches).
It is a known/intentional restriction of the HEAP_2_DLL that memory must
be freed in the order it was allocated? The default cygheap most likely
has the problem as well -- that's why I was crashing and had to call
cmalloc instead of crealloc(NULL, ...).
(Hairy details and test case below)
Ryan
Details: Disabling the topsort call fixed the problem (no surprise), as
did allocating dependencies from static storage rather than calling
malloc. I worried the latter just masked the bug, and sure enough,
calling cmalloc/cfree on dummy data during the sort brought the bug back
-- even though the cmalloc'd memory was never touched (??). However,
calling cmalloc/cfree once per dll instead of performing the topsort did
not trigger the bug. Reversing order of the list (with cmalloc'd
dependencies) instead of topsorting it caused the bug to come back.
So, I defined this small function:
static void break_cmalloc(int depth, int maxdepth) {
void* x = cmalloc (HEAP_2_DLL, 32);
cfree(x);
if (depth < maxdepth)
break_cmalloc(depth+1, maxdepth);
}
and called it during fork instead of dlls.topsort(), with maxdepth=5. No
bug (as expected).
Then I moved the call to cfree below the recursion, so memory gets freed
in reverse order. Bang. Bash goes down and takes mintty with it after
briefly flashing 'bad address.' Calling bash from cmd.exe hangs the
latter so badly Windows can't kill it (I guess I'll have to reboot).
I rolled back to the last CVS checkout I did (May 12), and the above
function reproduces the bug there as well. AFAIK, Corinna had merged my
/proc/maps patch by then, but the only fork-related changes were to fix
the uninitialized value in that one constructor, and to make failed
forkees not call dtors.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Bug in cmalloc? Was: Re: Problems with: Improvements to fork handling (2/5)
2011-05-29 8:42 ` Bug in cmalloc? Was: " Ryan Johnson
@ 2011-05-29 16:28 ` Christopher Faylor
2011-05-30 6:25 ` Christopher Faylor
0 siblings, 1 reply; 23+ messages in thread
From: Christopher Faylor @ 2011-05-29 16:28 UTC (permalink / raw)
To: cygwin-patches
On Sun, May 29, 2011 at 01:51:35AM -0400, Ryan Johnson wrote:
>So, I defined this small function:
>
>static void break_cmalloc(int depth, int maxdepth) {
> void* x = cmalloc (HEAP_2_DLL, 32);
> cfree(x);
> if (depth < maxdepth)
> break_cmalloc(depth+1, maxdepth);
>}
>
>and called it during fork instead of dlls.topsort(), with maxdepth=5. No
>bug (as expected).
>
>Then I moved the call to cfree below the recursion, so memory gets freed
>in reverse order. Bang. Bash goes down and takes mintty with it after
>briefly flashing 'bad address.' Calling bash from cmd.exe hangs the
>latter so badly Windows can't kill it (I guess I'll have to reboot).
Thanks for the test case. I'll investigate later today.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Bug in cmalloc? Was: Re: Problems with: Improvements to fork handling (2/5)
2011-05-29 16:28 ` Christopher Faylor
@ 2011-05-30 6:25 ` Christopher Faylor
2011-05-30 6:54 ` Christopher Faylor
2011-05-30 12:25 ` Ryan Johnson
0 siblings, 2 replies; 23+ messages in thread
From: Christopher Faylor @ 2011-05-30 6:25 UTC (permalink / raw)
To: cygwin-patches
On Sun, May 29, 2011 at 12:27:45PM -0400, Christopher Faylor wrote:
>On Sun, May 29, 2011 at 01:51:35AM -0400, Ryan Johnson wrote:
>>So, I defined this small function:
>>
>>static void break_cmalloc(int depth, int maxdepth) {
>> void* x = cmalloc (HEAP_2_DLL, 32);
>> cfree(x);
>> if (depth < maxdepth)
>> break_cmalloc(depth+1, maxdepth);
>>}
>>
>>and called it during fork instead of dlls.topsort(), with maxdepth=5. No
>>bug (as expected).
>>
>>Then I moved the call to cfree below the recursion, so memory gets freed
>>in reverse order. Bang. Bash goes down and takes mintty with it after
>>briefly flashing 'bad address.' Calling bash from cmd.exe hangs the
>>latter so badly Windows can't kill it (I guess I'll have to reboot).
>
>Thanks for the test case. I'll investigate later today.
As it turns out, this is not a bug in cmalloc. fork() was not designed
to allow calling cmalloc() after the "frok grouped" definition at the
beginning of the function. That records the bounds of the cygheap which
is passed to the child. If you call cmalloc and friends after that then
the child process will never know that the heap has been extended. Then
when the heap is copied from the parent by cygheap_fixup_in_child(),
you'll likely be missing pieces of the parent's cygheap and pieces of
the freed pool will end up pointing into blocks of memory which are not
properly initialized.
So, anyway, the problem can likely be fixed by moving the call to
topsort earlier. I'll try that tomorrow.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Bug in cmalloc? Was: Re: Problems with: Improvements to fork handling (2/5)
2011-05-30 6:25 ` Christopher Faylor
@ 2011-05-30 6:54 ` Christopher Faylor
2011-05-30 7:15 ` Christopher Faylor
2011-05-30 12:25 ` Ryan Johnson
1 sibling, 1 reply; 23+ messages in thread
From: Christopher Faylor @ 2011-05-30 6:54 UTC (permalink / raw)
To: cygwin-patches
On Mon, May 30, 2011 at 02:24:49AM -0400, Christopher Faylor wrote:
>On Sun, May 29, 2011 at 12:27:45PM -0400, Christopher Faylor wrote:
>>On Sun, May 29, 2011 at 01:51:35AM -0400, Ryan Johnson wrote:
>>>So, I defined this small function:
>>>
>>>static void break_cmalloc(int depth, int maxdepth) {
>>> void* x = cmalloc (HEAP_2_DLL, 32);
>>> cfree(x);
>>> if (depth < maxdepth)
>>> break_cmalloc(depth+1, maxdepth);
>>>}
>>>
>>>and called it during fork instead of dlls.topsort(), with maxdepth=5. No
>>>bug (as expected).
>>>
>>>Then I moved the call to cfree below the recursion, so memory gets freed
>>>in reverse order. Bang. Bash goes down and takes mintty with it after
>>>briefly flashing 'bad address.' Calling bash from cmd.exe hangs the
>>>latter so badly Windows can't kill it (I guess I'll have to reboot).
>>
>>Thanks for the test case. I'll investigate later today.
>
>As it turns out, this is not a bug in cmalloc. fork() was not designed
>to allow calling cmalloc() after the "frok grouped" definition at the
>beginning of the function. That records the bounds of the cygheap which
>is passed to the child. If you call cmalloc and friends after that then
>the child process will never know that the heap has been extended. Then
>when the heap is copied from the parent by cygheap_fixup_in_child(),
>you'll likely be missing pieces of the parent's cygheap and pieces of
>the freed pool will end up pointing into blocks of memory which are not
>properly initialized.
>
>So, anyway, the problem can likely be fixed by moving the call to
>topsort earlier. I'll try that tomorrow.
Actually, I checked in patch 2/5. That completes the set, I think.
We're not going to check in 5/5 since it seems pretty iffy.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Bug in cmalloc? Was: Re: Problems with: Improvements to fork handling (2/5)
2011-05-30 6:54 ` Christopher Faylor
@ 2011-05-30 7:15 ` Christopher Faylor
0 siblings, 0 replies; 23+ messages in thread
From: Christopher Faylor @ 2011-05-30 7:15 UTC (permalink / raw)
To: cygwin-patches
On Mon, May 30, 2011 at 02:53:51AM -0400, Christopher Faylor wrote:
>Actually, I checked in patch 2/5. That completes the set, I think.
Nope. I still have to do 4/5. I WILL do that when after sleeping first.
cgf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Bug in cmalloc? Was: Re: Problems with: Improvements to fork handling (2/5)
2011-05-30 6:25 ` Christopher Faylor
2011-05-30 6:54 ` Christopher Faylor
@ 2011-05-30 12:25 ` Ryan Johnson
1 sibling, 0 replies; 23+ messages in thread
From: Ryan Johnson @ 2011-05-30 12:25 UTC (permalink / raw)
To: cygwin-patches
On 30/05/2011 2:24 AM, Christopher Faylor wrote:
> On Sun, May 29, 2011 at 12:27:45PM -0400, Christopher Faylor wrote:
>> On Sun, May 29, 2011 at 01:51:35AM -0400, Ryan Johnson wrote:
>>> So, I defined this small function:
>>>
>>> static void break_cmalloc(int depth, int maxdepth) {
>>> void* x = cmalloc (HEAP_2_DLL, 32);
>>> cfree(x);
>>> if (depth< maxdepth)
>>> break_cmalloc(depth+1, maxdepth);
>>> }
>>>
>>> and called it during fork instead of dlls.topsort(), with maxdepth=5. No
>>> bug (as expected).
>>>
>>> Then I moved the call to cfree below the recursion, so memory gets freed
>>> in reverse order. Bang. Bash goes down and takes mintty with it after
>>> briefly flashing 'bad address.' Calling bash from cmd.exe hangs the
>>> latter so badly Windows can't kill it (I guess I'll have to reboot).
>> Thanks for the test case. I'll investigate later today.
> As it turns out, this is not a bug in cmalloc. fork() was not designed
> to allow calling cmalloc() after the "frok grouped" definition at the
> beginning of the function. That records the bounds of the cygheap which
> is passed to the child. If you call cmalloc and friends after that then
> the child process will never know that the heap has been extended. Then
> when the heap is copied from the parent by cygheap_fixup_in_child(),
> you'll likely be missing pieces of the parent's cygheap and pieces of
> the freed pool will end up pointing into blocks of memory which are not
> properly initialized.
Ouch... and those pieces that didn't get copied are exactly the ones the
child tries to read first when loading dlls.
Sorry for the false alarm -- I always assumed that was done after the
call to setjmp, on the parent's side. Should have checked.
BTW, while poring over the patch I noticed a couple irregularities which
you might want to remove:
1. The declaration for populate_all_deps in dll_init.h is never defined
or used
2. The incrementing of the (seemingly dead) variable 'tot' ended up in
dll_list::append and should be moved back to dll_list::alloc where it
belongs.
Ryan
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-05-30 12:25 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-11 18:31 Improvements to fork handling (2/5) Ryan Johnson
2011-05-22 1:44 ` Christopher Faylor
2011-05-22 18:42 ` Ryan Johnson
2011-05-23 7:32 ` Corinna Vinschen
2011-05-24 11:28 ` Ryan Johnson
2011-05-24 13:06 ` Corinna Vinschen
2011-05-24 17:01 ` Ryan Johnson
2011-05-24 20:19 ` Ryan Johnson
2011-05-24 16:15 ` Corinna Vinschen
2011-05-24 16:47 ` Ryan Johnson
2011-05-24 17:03 ` Christopher Faylor
2011-05-28 20:50 ` Problems with: " Christopher Faylor
2011-05-28 22:40 ` Ryan Johnson
2011-05-29 0:23 ` Christopher Faylor
2011-05-29 2:36 ` Ryan Johnson
2011-05-29 4:38 ` Daniel Colascione
2011-05-29 8:00 ` Christopher Faylor
2011-05-29 8:42 ` Bug in cmalloc? Was: " Ryan Johnson
2011-05-29 16:28 ` Christopher Faylor
2011-05-30 6:25 ` Christopher Faylor
2011-05-30 6:54 ` Christopher Faylor
2011-05-30 7:15 ` Christopher Faylor
2011-05-30 12:25 ` Ryan Johnson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).