public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* coredumps and/or CPU eating zombies after dlopen/fork
@ 2022-11-25 13:22 Dmitry Karasik
  2022-11-25 15:38 ` Thomas Wolff
  2022-11-26  1:43 ` Takashi Yano
  0 siblings, 2 replies; 3+ messages in thread
From: Dmitry Karasik @ 2022-11-25 13:22 UTC (permalink / raw)
  To: cygwin

URL: http://karasik.eu.org/misc/cygwin/

Dear all,

Here's some exception that is caused if gtk_settings_get_default() is called from a
dll and then later fork() call is made.  The bug is not observed if the call is
made in the main program, and neither is observed if the gtk initialization is
done but gtk_settings_get_default() is not called.

Warning: If you run ./dlload.exe without CYGWIN environment variable being set to
dumper that will terminate the process, your system will accumulate copies of
dlload.exe, zombie-like, which will eat CPU. strace says that these zombie
processes repeatedly hit exceptions in endless loops. The following strace
is repeated forever after the fork:

--- Process 9108 (pid: 10439), exception c0000005 at 00000003f5baa8e0
 1960   21097 [main] perl 10439 exception::handle: In cygwin_except_handler exception 0xC0000005 at 0x3F5BAA8E0 sp 0xFFFFC5A8
   16   21113 [main] perl 10439 exception::handle: In cygwin_except_handler signal 11 at 0x3F5BAA8E0
   14   21127 [main] perl 10439 try_to_debug: debugger_command 'dumper "./dlload.exe"'
   23   21150 [main] perl 10439 break_here: break here
   12   21162 [main] perl 10439 sig_send: sendsig 0x13C, pid 10439, signal 11, its_me 1
   14   21176 [main] perl 10439 sig_send: wakeup 0x3F4
   15   21191 [main] perl 10439 sig_send: Waiting for pack.wakeup 0x3F4
   19   21210 [sig] perl 10439 sigpacket::process: returning -1
   19   21229 [sig] perl 10439 wait_sig: signalling pack.wakeup 0x3F4
   17   21246 [main] perl 10439 sig_send: returning 0x0 from sending signal 11

I encountered this problem when I've seen random perl and python scripts hanging (as they were apparently waiting for
forked child that never ended), and when ^C-d, I notices the accumulation of the zombie processes.

The dumper's coredump doesn't show the culprit, but it does show this:
(gdb) bt
#0  0x00007ffa4870d744 in ntdll!ZwDelayExecution () from C:/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007ffa4601b03e in SleepEx () from C:/WINDOWS/System32/KERNELBASE.dll
#2  0x000000018006205a in try_to_debug () from C:/cygwin64/bin/cygwin1.dll
#3  0x00000001800624f6 in exception::handle(_EXCEPTION_RECORD*, void*, _CONTEXT*, _DISPATCHER_CONTEXT*) () from C:/cygwin64/bin/cygwin1.dll
#4  0x00007ffa4871241f in ntdll!.chkstk () from C:/WINDOWS/SYSTEM32/ntdll.dll
#5  0x00007ffa486c14a4 in ntdll!RtlRaiseException () from C:/WINDOWS/SYSTEM32/ntdll.dll
#6  0x00007ffa48710f4e in ntdll!KiUserExceptionDispatcher () from C:/WINDOWS/SYSTEM32/ntdll.dll
#7  0x00000003f5baa8e0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

which seems to indicate that the exception is somewhere in cygwin runtime.  I
haven't got around to finding out where that bug in the runtime is exactly, as
I'd like to hear if there any smart strategies of doing that.

I neither succeed to reduce the gtk_settings_get_default() to something more
chewable (that call was actually most reduced), even though I recompiled gtk3
locally, but its strace strangely doesn't show anything suspicious, no forks,
no open sockets, no pipe calls, just file openings (see strace.gsettings).

Kindly advise how to proceed if I can help fixing this, so far I'm a bit stuck.

Otherwise, to reproduce, download and unpack http://karasik.eu.org/misc/cygwin/cygwin-gtk-dlopen-fork-bug.tar
and run ./try there.

-- 
Sincerely,
	Dmitry Karasik


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: coredumps and/or CPU eating zombies after dlopen/fork
  2022-11-25 13:22 coredumps and/or CPU eating zombies after dlopen/fork Dmitry Karasik
@ 2022-11-25 15:38 ` Thomas Wolff
  2022-11-26  1:43 ` Takashi Yano
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Wolff @ 2022-11-25 15:38 UTC (permalink / raw)
  To: cygwin



Am 25/11/2022 um 14:22 schrieb Dmitry Karasik:
> URL: http://karasik.eu.org/misc/cygwin/
>
> Dear all,
>
> Here's some exception that is caused if gtk_settings_get_default() is called from a
> dll and then later fork() call is made.  The bug is not observed if the call is
> made in the main program, and neither is observed if the gtk initialization is
> done but gtk_settings_get_default() is not called.
>
> Warning: If you run ./dlload.exe without CYGWIN environment variable being set to
> dumper that will terminate the process, your system will accumulate copies of
> dlload.exe, zombie-like, which will eat CPU. strace says that these zombie
> processes repeatedly hit exceptions in endless loops. The following strace
> is repeated forever after the fork:
>
> --- Process 9108 (pid: 10439), exception c0000005 at 00000003f5baa8e0
>   1960   21097 [main] perl 10439 exception::handle: In cygwin_except_handler exception 0xC0000005 at 0x3F5BAA8E0 sp 0xFFFFC5A8
>     16   21113 [main] perl 10439 exception::handle: In cygwin_except_handler signal 11 at 0x3F5BAA8E0
>     14   21127 [main] perl 10439 try_to_debug: debugger_command 'dumper "./dlload.exe"'
>     23   21150 [main] perl 10439 break_here: break here
>     12   21162 [main] perl 10439 sig_send: sendsig 0x13C, pid 10439, signal 11, its_me 1
>     14   21176 [main] perl 10439 sig_send: wakeup 0x3F4
>     15   21191 [main] perl 10439 sig_send: Waiting for pack.wakeup 0x3F4
>     19   21210 [sig] perl 10439 sigpacket::process: returning -1
>     19   21229 [sig] perl 10439 wait_sig: signalling pack.wakeup 0x3F4
>     17   21246 [main] perl 10439 sig_send: returning 0x0 from sending signal 11
>
> I encountered this problem when I've seen random perl and python scripts hanging (as they were apparently waiting for
> forked child that never ended), and when ^C-d, I notices the accumulation of the zombie processes.
>
> The dumper's coredump doesn't show the culprit, but it does show this:
> (gdb) bt
> #0  0x00007ffa4870d744 in ntdll!ZwDelayExecution () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #1  0x00007ffa4601b03e in SleepEx () from C:/WINDOWS/System32/KERNELBASE.dll
> #2  0x000000018006205a in try_to_debug () from C:/cygwin64/bin/cygwin1.dll
> #3  0x00000001800624f6 in exception::handle(_EXCEPTION_RECORD*, void*, _CONTEXT*, _DISPATCHER_CONTEXT*) () from C:/cygwin64/bin/cygwin1.dll
> #4  0x00007ffa4871241f in ntdll!.chkstk () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #5  0x00007ffa486c14a4 in ntdll!RtlRaiseException () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #6  0x00007ffa48710f4e in ntdll!KiUserExceptionDispatcher () from C:/WINDOWS/SYSTEM32/ntdll.dll
> #7  0x00000003f5baa8e0 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
> which seems to indicate that the exception is somewhere in cygwin runtime.  I
> haven't got around to finding out where that bug in the runtime is exactly, as
> I'd like to hear if there any smart strategies of doing that.
>
> I neither succeed to reduce the gtk_settings_get_default() to something more
> chewable (that call was actually most reduced), even though I recompiled gtk3
> locally, but its strace strangely doesn't show anything suspicious, no forks,
> no open sockets, no pipe calls, just file openings (see strace.gsettings).
>
> Kindly advise how to proceed if I can help fixing this, so far I'm a bit stuck.
I had trouble with dlopen myself, until I found it cannot be nested if a 
library called uses dlopen itself.
In my case, it helped to add flags RTLD_LAZY | RTLD_GLOBAL to dlopen.

>
> Otherwise, to reproduce, download and unpack http://karasik.eu.org/misc/cygwin/cygwin-gtk-dlopen-fork-bug.tar
> and run ./try there.
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: coredumps and/or CPU eating zombies after dlopen/fork
  2022-11-25 13:22 coredumps and/or CPU eating zombies after dlopen/fork Dmitry Karasik
  2022-11-25 15:38 ` Thomas Wolff
@ 2022-11-26  1:43 ` Takashi Yano
  1 sibling, 0 replies; 3+ messages in thread
From: Takashi Yano @ 2022-11-26  1:43 UTC (permalink / raw)
  To: cygwin; +Cc: Dmitry Karasik

On Fri, 25 Nov 2022 14:22:15 +0100
Dmitry Karasik wrote:
> Otherwise, to reproduce, download and unpack http://karasik.eu.org/misc/cygwin/cygwin-gtk-dlopen-fork-bug.tar
> and run ./try there.

Why do you include windows.h in test.c?
I encountered following error.

/usr/include/w32api/processthreadsapi.h:165:11: Error: expected identifier or ‘(’ before numeric constant
  165 |     ULONG ControlMask;
      |           ^~~~~~~~~~~

Removing #include <windows.h> or adding
#undef ControlMask
is necessary to compile test.c without errors.

With that modification, your problem can not be reproducible
in my environment. The result is as follows.

not expecting a bug
child existed with 0
expecting a bug
child existed with 0
no bug detected

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-26  1:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-25 13:22 coredumps and/or CPU eating zombies after dlopen/fork Dmitry Karasik
2022-11-25 15:38 ` Thomas Wolff
2022-11-26  1:43 ` Takashi Yano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).