From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from karasik.eu.org (karasik.eu.org [185.59.228.109]) by sourceware.org (Postfix) with ESMTP id E16CA38532CD for ; Fri, 25 Nov 2022 13:22:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E16CA38532CD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=karasik.eu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=karasik.eu.org Received: by karasik.eu.org (Postfix, from userid 1003) id E94F3144A2; Fri, 25 Nov 2022 14:22:15 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=karasik.eu.org; s=201812; t=1669382535; bh=lN8lfCnsZ3gX1dkEpkD2R+lS+sc8HoSqFZbv1doqTZs=; h=Date:From:To:Subject:From; b=pwiJxbTkcsZ4BrqNnCASY0bjnFHtnxQWVFPFVmPae0V7l/y7tl72eOg9fGPp6xNtk rCmYg1JBZuw9q1cgfTboPLJS/bnZscemB0EBx83X4OQ4ugXlgv7xOFtoLEddZ1Qkdz HaV+xX63frPDHeq/ejYqr3Ly34kneWtJC8CLdoJm3CtLGIBsMJy4GkOge5+/Q0+hT7 JmtFuhfKcXagGFmJjKnDDmm/w4AqGGT2D4/uXaFp0RZHOsRjBukrKQk8pU0Xf/Mzer dEsNu0FzRLF7VcM+WRFoCljaLrZ9Lk1ih+jDdn70ns8Ng/mRvhwF4NqOA8KgABRO6Y TdkbKDQv36V6Q== Date: Fri, 25 Nov 2022 14:22:15 +0100 From: Dmitry Karasik To: cygwin@cygwin.com Subject: coredumps and/or CPU eating zombies after dlopen/fork Message-ID: <20221125132215.GA24139@nataraj.eu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline X-Operating-System: FreeBSD 11.1-RELEASE User-Agent: Mutt/1.11.1 (2018-12-01) X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_EU,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: URL: http://karasik.eu.org/misc/cygwin/ Dear all, Here's some exception that is caused if gtk_settings_get_default() is called from a dll and then later fork() call is made. The bug is not observed if the call is made in the main program, and neither is observed if the gtk initialization is done but gtk_settings_get_default() is not called. Warning: If you run ./dlload.exe without CYGWIN environment variable being set to dumper that will terminate the process, your system will accumulate copies of dlload.exe, zombie-like, which will eat CPU. strace says that these zombie processes repeatedly hit exceptions in endless loops. The following strace is repeated forever after the fork: --- Process 9108 (pid: 10439), exception c0000005 at 00000003f5baa8e0 1960 21097 [main] perl 10439 exception::handle: In cygwin_except_handler exception 0xC0000005 at 0x3F5BAA8E0 sp 0xFFFFC5A8 16 21113 [main] perl 10439 exception::handle: In cygwin_except_handler signal 11 at 0x3F5BAA8E0 14 21127 [main] perl 10439 try_to_debug: debugger_command 'dumper "./dlload.exe"' 23 21150 [main] perl 10439 break_here: break here 12 21162 [main] perl 10439 sig_send: sendsig 0x13C, pid 10439, signal 11, its_me 1 14 21176 [main] perl 10439 sig_send: wakeup 0x3F4 15 21191 [main] perl 10439 sig_send: Waiting for pack.wakeup 0x3F4 19 21210 [sig] perl 10439 sigpacket::process: returning -1 19 21229 [sig] perl 10439 wait_sig: signalling pack.wakeup 0x3F4 17 21246 [main] perl 10439 sig_send: returning 0x0 from sending signal 11 I encountered this problem when I've seen random perl and python scripts hanging (as they were apparently waiting for forked child that never ended), and when ^C-d, I notices the accumulation of the zombie processes. The dumper's coredump doesn't show the culprit, but it does show this: (gdb) bt #0 0x00007ffa4870d744 in ntdll!ZwDelayExecution () from C:/WINDOWS/SYSTEM32/ntdll.dll #1 0x00007ffa4601b03e in SleepEx () from C:/WINDOWS/System32/KERNELBASE.dll #2 0x000000018006205a in try_to_debug () from C:/cygwin64/bin/cygwin1.dll #3 0x00000001800624f6 in exception::handle(_EXCEPTION_RECORD*, void*, _CONTEXT*, _DISPATCHER_CONTEXT*) () from C:/cygwin64/bin/cygwin1.dll #4 0x00007ffa4871241f in ntdll!.chkstk () from C:/WINDOWS/SYSTEM32/ntdll.dll #5 0x00007ffa486c14a4 in ntdll!RtlRaiseException () from C:/WINDOWS/SYSTEM32/ntdll.dll #6 0x00007ffa48710f4e in ntdll!KiUserExceptionDispatcher () from C:/WINDOWS/SYSTEM32/ntdll.dll #7 0x00000003f5baa8e0 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) which seems to indicate that the exception is somewhere in cygwin runtime. I haven't got around to finding out where that bug in the runtime is exactly, as I'd like to hear if there any smart strategies of doing that. I neither succeed to reduce the gtk_settings_get_default() to something more chewable (that call was actually most reduced), even though I recompiled gtk3 locally, but its strace strangely doesn't show anything suspicious, no forks, no open sockets, no pipe calls, just file openings (see strace.gsettings). Kindly advise how to proceed if I can help fixing this, so far I'm a bit stuck. Otherwise, to reproduce, download and unpack http://karasik.eu.org/misc/cygwin/cygwin-gtk-dlopen-fork-bug.tar and run ./try there. -- Sincerely, Dmitry Karasik