* Odd hang in python waiting for child; strace wakes hung process? @ 2018-03-21 17:55 Dan Kegel 2018-03-21 19:20 ` Achim Gratz 0 siblings, 1 reply; 10+ messages in thread From: Dan Kegel @ 2018-03-21 17:55 UTC (permalink / raw) To: cygwin Dear Cygwin heros, I've been happily using cygwin to run buildbot slaves for several years. However, periodically they hang at the end of executing git. It kind of smells like SIGCHLD isn't delivered somehow. I shrugged it off as 'I'm using an ancient copy of twisted, it's my fault' for a long time, but recently updated to latest twisted, and it is still happening; today I came in and both of my windows buildbots are hung with it. While sniffing around at the edges of the problem, I noticed that the hung process would resume if I simply ran strace on it and then hit ^C to terminate strace. Here's the output of cygcheck and strace: http://kegel.com/linux/cyghang/ Older context at https://github.com/buildbot/buildbot/issues/3801 I'll surely hit this problem again soon; what other info should I gather? My next sensible step is to use an up to date version of buildbot's slave (I'm running a very old one at the moment), but it seems kind of fishy that strace could wake the process up. Thanks, Dan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-21 17:55 Odd hang in python waiting for child; strace wakes hung process? Dan Kegel @ 2018-03-21 19:20 ` Achim Gratz 2018-03-21 22:09 ` Dan Kegel 0 siblings, 1 reply; 10+ messages in thread From: Achim Gratz @ 2018-03-21 19:20 UTC (permalink / raw) To: cygwin Dan Kegel writes: > I've been happily using cygwin to run buildbot slaves for several > years. However, periodically they hang at the end of executing git. > It kind of smells like SIGCHLD isn't delivered somehow. That smells like it's the same as a longstanding problem I have with certain Perl tests and a few (a bit too complicated) Perl scripts at work. I have never been able to reproduce it in a way that might enable Corinna to have a look, unfortunately. > My next sensible step is to use an up to date version of buildbot's slave > (I'm running a very old one at the moment), but it seems kind of > fishy that strace could wake the process up. Well, with the sporadic hanging/defunct processes at work my routine is to send CONT to all Cygwin processes, then HUP/KILL to anything that's still not live or gone and then another round of CONT. This works _most_ of the time, anything more stubborn I /bin/kill -f usually. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Factory and User Sound Singles for Waldorf rackAttack: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-21 19:20 ` Achim Gratz @ 2018-03-21 22:09 ` Dan Kegel 2018-03-22 7:38 ` Brian Inglis 2018-03-22 17:33 ` Achim Gratz 0 siblings, 2 replies; 10+ messages in thread From: Dan Kegel @ 2018-03-21 22:09 UTC (permalink / raw) To: cygwin On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote: > Well, with the sporadic hanging/defunct processes at work my routine is > to send CONT to all Cygwin processes, then HUP/KILL to anything that's > still not live or gone and then another round of CONT. This works > _most_ of the time, anything more stubborn I /bin/kill -f usually. Since I wrote, both bots hung again. This time I verified: 0) both had the defunct git process as expected 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's truly wedged. 2) the process stays in its spin hang until you ^C strace -p the-pid 3) ^C-ing the strace causes the process to terminate (strace alone doesn't) 4) taskkill /pid the-task /f also kills the process successfully. So I could write a script that watched for defunct git processes and taskkilled their parent. Build jobs would fail, but at least the bot would stay up. Of course it would be much nicer if the cygwin python process didn't get wedged. Alternately, I suppose I could try running native python... or cygwin's python3... but dangit, kill -9 should work. - Dan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-21 22:09 ` Dan Kegel @ 2018-03-22 7:38 ` Brian Inglis 2018-03-22 17:10 ` Corinna Vinschen 2018-03-22 17:33 ` Achim Gratz 1 sibling, 1 reply; 10+ messages in thread From: Brian Inglis @ 2018-03-22 7:38 UTC (permalink / raw) To: cygwin On 2018-03-21 16:07, Dan Kegel wrote: > On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote: >> Well, with the sporadic hanging/defunct processes at work my routine is >> to send CONT to all Cygwin processes, then HUP/KILL to anything that's >> still not live or gone and then another round of CONT. This works >> _most_ of the time, anything more stubborn I /bin/kill -f usually. > > Since I wrote, both bots hung again. This time I verified: > 0) both had the defunct git process as expected > 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's truly wedged. > 2) the process stays in its spin hang until you ^C strace -p the-pid > 3) ^C-ing the strace causes the process to terminate (strace alone doesn't) > 4) taskkill /pid the-task /f also kills the process successfully. > > So I could write a script that watched for defunct git processes > and taskkilled their parent. Build jobs would fail, but at least > the bot would stay up. Of course it would be much nicer if > the cygwin python process didn't get wedged. > > Alternately, I suppose I could try running native python... > or cygwin's python3... but dangit, kill -9 should work. Seems to be looping on access failure to a Windows mailslot; not sure what this feature is normally used for: dmesg/syslog messages/AF_UNIX sockets? -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-22 7:38 ` Brian Inglis @ 2018-03-22 17:10 ` Corinna Vinschen 2018-03-22 21:05 ` Dan Kegel 2018-03-24 1:54 ` Brian Inglis 0 siblings, 2 replies; 10+ messages in thread From: Corinna Vinschen @ 2018-03-22 17:10 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 2034 bytes --] On Mar 21 23:41, Brian Inglis wrote: > On 2018-03-21 16:07, Dan Kegel wrote: > > On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote: > >> Well, with the sporadic hanging/defunct processes at work my routine is > >> to send CONT to all Cygwin processes, then HUP/KILL to anything that's > >> still not live or gone and then another round of CONT. This works > >> _most_ of the time, anything more stubborn I /bin/kill -f usually. > > > > Since I wrote, both bots hung again. This time I verified: > > 0) both had the defunct git process as expected > > 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's truly wedged. > > 2) the process stays in its spin hang until you ^C strace -p the-pid > > 3) ^C-ing the strace causes the process to terminate (strace alone doesn't) > > 4) taskkill /pid the-task /f also kills the process successfully. > > > > So I could write a script that watched for defunct git processes > > and taskkilled their parent. Build jobs would fail, but at least > > the bot would stay up. Of course it would be much nicer if > > the cygwin python process didn't get wedged. > > > > Alternately, I suppose I could try running native python... > > or cygwin's python3... but dangit, kill -9 should work. > > Seems to be looping on access failure to a Windows mailslot; not sure what this > feature is normally used for: dmesg/syslog messages/AF_UNIX sockets? /dev/kmsg is implemented using a mailslot under the hood. This feature is only used to log exceptions and for nothing else since nobody ever found another reason to use it for. It would be interesting to learn if the perl hangs have the same reason. I guess we can simply remove /dev/kmsg support completely and drop the mailslot code. I'm pretty sure nobody would miss it. Hardly anybody knows it exists... Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-22 17:10 ` Corinna Vinschen @ 2018-03-22 21:05 ` Dan Kegel 2018-03-24 1:54 ` Brian Inglis 1 sibling, 0 replies; 10+ messages in thread From: Dan Kegel @ 2018-03-22 21:05 UTC (permalink / raw) To: cygwin On Thu, Mar 22, 2018 at 9:59 AM, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote: > I guess we can simply remove /dev/kmsg support completely and drop > the mailslot code. I'm pretty sure nobody would miss it. Hardly > anybody knows it exists... I'd be happy to test (though building cygwin is not in the cards for me for now). But would it break this guy's use case (running syslog-ng on cygwin)? http://kb.kaminskiengineering.com/node/382 https://web.archive.org/web/20110209222803/http://www.syslog.org/logged/running-syslog-ng-on-windows/ - Dan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-22 17:10 ` Corinna Vinschen 2018-03-22 21:05 ` Dan Kegel @ 2018-03-24 1:54 ` Brian Inglis 2018-03-25 18:12 ` Corinna Vinschen 1 sibling, 1 reply; 10+ messages in thread From: Brian Inglis @ 2018-03-24 1:54 UTC (permalink / raw) To: cygwin On 2018-03-22 10:59, Corinna Vinschen wrote: > On Mar 21 23:41, Brian Inglis wrote: >> On 2018-03-21 16:07, Dan Kegel wrote: >>> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote: >>>> Well, with the sporadic hanging/defunct processes at work my routine is >>>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's >>>> still not live or gone and then another round of CONT. This works >>>> _most_ of the time, anything more stubborn I /bin/kill -f usually. >>> >>> Since I wrote, both bots hung again. This time I verified: >>> 0) both had the defunct git process as expected >>> 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's truly wedged. >>> 2) the process stays in its spin hang until you ^C strace -p the-pid >>> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't) >>> 4) taskkill /pid the-task /f also kills the process successfully. >>> >>> So I could write a script that watched for defunct git processes >>> and taskkilled their parent. Build jobs would fail, but at least >>> the bot would stay up. Of course it would be much nicer if >>> the cygwin python process didn't get wedged. >>> >>> Alternately, I suppose I could try running native python... >>> or cygwin's python3... but dangit, kill -9 should work. >> >> Seems to be looping on access failure to a Windows mailslot; not sure what this >> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets? > > /dev/kmsg is implemented using a mailslot under the hood. This > feature is only used to log exceptions and for nothing else since > nobody ever found another reason to use it for. > > It would be interesting to learn if the perl hangs have the same reason. > > I guess we can simply remove /dev/kmsg support completely and drop > the mailslot code. I'm pretty sure nobody would miss it. Hardly > anybody knows it exists... Is /dev/log implemented the same way? Looks like syslog-ng stopped working around the last upgrade: $ cat /var/log/syslog-ng.log Error reading serialized data; error='Error reading file (short read)' Persistent configuration file is in invalid format, ignoring; Error binding socket; addr='AF_UNIX(/dev/log)', error='Address already in use (112)' Error initializing source driver; source='s_local', id='s_local#0' Error initializing message pipeline; ... [repeats] -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-24 1:54 ` Brian Inglis @ 2018-03-25 18:12 ` Corinna Vinschen 2018-03-25 22:16 ` Brian Inglis 0 siblings, 1 reply; 10+ messages in thread From: Corinna Vinschen @ 2018-03-25 18:12 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 2644 bytes --] On Mar 23 19:21, Brian Inglis wrote: > On 2018-03-22 10:59, Corinna Vinschen wrote: > > On Mar 21 23:41, Brian Inglis wrote: > >> On 2018-03-21 16:07, Dan Kegel wrote: > >>> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote: > >>>> Well, with the sporadic hanging/defunct processes at work my routine is > >>>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's > >>>> still not live or gone and then another round of CONT. This works > >>>> _most_ of the time, anything more stubborn I /bin/kill -f usually. > >>> > >>> Since I wrote, both bots hung again. This time I verified: > >>> 0) both had the defunct git process as expected > >>> 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's truly wedged. > >>> 2) the process stays in its spin hang until you ^C strace -p the-pid > >>> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't) > >>> 4) taskkill /pid the-task /f also kills the process successfully. > >>> > >>> So I could write a script that watched for defunct git processes > >>> and taskkilled their parent. Build jobs would fail, but at least > >>> the bot would stay up. Of course it would be much nicer if > >>> the cygwin python process didn't get wedged. > >>> > >>> Alternately, I suppose I could try running native python... > >>> or cygwin's python3... but dangit, kill -9 should work. > >> > >> Seems to be looping on access failure to a Windows mailslot; not sure what this > >> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets? > > > > /dev/kmsg is implemented using a mailslot under the hood. This > > feature is only used to log exceptions and for nothing else since > > nobody ever found another reason to use it for. > > > > It would be interesting to learn if the perl hangs have the same reason. > > > > I guess we can simply remove /dev/kmsg support completely and drop > > the mailslot code. I'm pretty sure nobody would miss it. Hardly > > anybody knows it exists... > > Is /dev/log implemented the same way? No. /dev/log is a AF_UNIX socket. > Looks like syslog-ng stopped working around the last upgrade: > > $ cat /var/log/syslog-ng.log > Error reading serialized data; error='Error reading file (short read)' > Persistent configuration file is in invalid format, ignoring; > Error binding socket; addr='AF_UNIX(/dev/log)', error='Address already in use (112)' rm -rf /dev/log Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-25 18:12 ` Corinna Vinschen @ 2018-03-25 22:16 ` Brian Inglis 0 siblings, 0 replies; 10+ messages in thread From: Brian Inglis @ 2018-03-25 22:16 UTC (permalink / raw) To: cygwin On 2018-03-25 04:24, Corinna Vinschen wrote: > On Mar 23 19:21, Brian Inglis wrote: >> On 2018-03-22 10:59, Corinna Vinschen wrote: >>> On Mar 21 23:41, Brian Inglis wrote: >>>> On 2018-03-21 16:07, Dan Kegel wrote: >>>>> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote: >>>>>> Well, with the sporadic hanging/defunct processes at work my routine is >>>>>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's >>>>>> still not live or gone and then another round of CONT. This works >>>>>> _most_ of the time, anything more stubborn I /bin/kill -f usually. >>>>> >>>>> Since I wrote, both bots hung again. This time I verified: >>>>> 0) both had the defunct git process as expected >>>>> 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's truly wedged. >>>>> 2) the process stays in its spin hang until you ^C strace -p the-pid >>>>> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't) >>>>> 4) taskkill /pid the-task /f also kills the process successfully. >>>>> >>>>> So I could write a script that watched for defunct git processes >>>>> and taskkilled their parent. Build jobs would fail, but at least >>>>> the bot would stay up. Of course it would be much nicer if >>>>> the cygwin python process didn't get wedged. >>>>> >>>>> Alternately, I suppose I could try running native python... >>>>> or cygwin's python3... but dangit, kill -9 should work. >>>> >>>> Seems to be looping on access failure to a Windows mailslot; not sure what this >>>> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets? >>> >>> /dev/kmsg is implemented using a mailslot under the hood. This >>> feature is only used to log exceptions and for nothing else since >>> nobody ever found another reason to use it for. >>> >>> It would be interesting to learn if the perl hangs have the same reason. >>> >>> I guess we can simply remove /dev/kmsg support completely and drop >>> the mailslot code. I'm pretty sure nobody would miss it. Hardly >>> anybody knows it exists... >> >> Is /dev/log implemented the same way? > > No. /dev/log is a AF_UNIX socket. > >> Looks like syslog-ng stopped working around the last upgrade: >> >> $ cat /var/log/syslog-ng.log >> Error reading serialized data; error='Error reading file (short read)' >> Persistent configuration file is in invalid format, ignoring; >> Error binding socket; addr='AF_UNIX(/dev/log)', error='Address already in use (112)' > > rm -rf /dev/log $ ll /dev/log && /bin/rm -f /dev/log && ll /dev/log -rw-rw-rw- 1 SYSTEM SYSTEM 54 Nov 24 20:59 /dev/log ls: cannot access '/dev/log': No such file or directory [start services] $ cyg-srv-status.sh cron Running cygserver Running sendmail Running sshd Running syslog-ng Running Thank you very much! -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Odd hang in python waiting for child; strace wakes hung process? 2018-03-21 22:09 ` Dan Kegel 2018-03-22 7:38 ` Brian Inglis @ 2018-03-22 17:33 ` Achim Gratz 1 sibling, 0 replies; 10+ messages in thread From: Achim Gratz @ 2018-03-22 17:33 UTC (permalink / raw) To: cygwin Dan Kegel writes: > 1) kill -CONT the-pid doesn't do anything, nor does kill -9. It's > truly wedged. No, you send CONT to all processes (like 'pkill -CONT .'), it seems that this gets whatever was wedged enough of a kick to retry or time out and then it (often, not always) it seems to start reaping processes again and the <defuncts> start to vanish. You can tell you have one of these when just doing a ps takes much longer than usual when it hits those processes. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ SD adaptations for KORG EX-800 and Poly-800MkII V0.9: http://Synth.Stromeko.net/Downloads.html#KorgSDada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-03-25 18:12 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-21 17:55 Odd hang in python waiting for child; strace wakes hung process? Dan Kegel 2018-03-21 19:20 ` Achim Gratz 2018-03-21 22:09 ` Dan Kegel 2018-03-22 7:38 ` Brian Inglis 2018-03-22 17:10 ` Corinna Vinschen 2018-03-22 21:05 ` Dan Kegel 2018-03-24 1:54 ` Brian Inglis 2018-03-25 18:12 ` Corinna Vinschen 2018-03-25 22:16 ` Brian Inglis 2018-03-22 17:33 ` Achim Gratz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).