public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Odd hang in python waiting for child; strace wakes hung process?
@ 2018-03-21 17:55 Dan Kegel
  2018-03-21 19:20 ` Achim Gratz
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Kegel @ 2018-03-21 17:55 UTC (permalink / raw)
  To: cygwin

Dear Cygwin heros,

I've been happily using cygwin to run buildbot slaves for several
years.  However,
periodically they hang at the end of executing git.  It kind of smells like
SIGCHLD isn't delivered somehow.

I shrugged it off as 'I'm using an ancient copy of twisted, it's my fault' for a
long time, but recently updated to latest twisted, and it is still happening;
today I came in and both of my windows buildbots are hung with it.

While sniffing around at the edges of the problem, I noticed that the
hung process would resume if I simply ran strace on it and then hit ^C
to terminate strace.

Here's the output of cygcheck and strace:
http://kegel.com/linux/cyghang/
Older context at
https://github.com/buildbot/buildbot/issues/3801

I'll surely hit this problem again soon; what other info should I gather?

My next sensible step is to use an up to date version of buildbot's slave
(I'm running a very old one at the moment), but it seems kind of
fishy that strace could wake the process up.

Thanks,
Dan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-21 17:55 Odd hang in python waiting for child; strace wakes hung process? Dan Kegel
@ 2018-03-21 19:20 ` Achim Gratz
  2018-03-21 22:09   ` Dan Kegel
  0 siblings, 1 reply; 10+ messages in thread
From: Achim Gratz @ 2018-03-21 19:20 UTC (permalink / raw)
  To: cygwin

Dan Kegel writes:
> I've been happily using cygwin to run buildbot slaves for several
> years.  However, periodically they hang at the end of executing git.
> It kind of smells like SIGCHLD isn't delivered somehow.

That smells like it's the same as a longstanding problem I have with
certain Perl tests and a few (a bit too complicated) Perl scripts at
work.  I have never been able to reproduce it in a way that might enable
Corinna to have a look, unfortunately.

> My next sensible step is to use an up to date version of buildbot's slave
> (I'm running a very old one at the moment), but it seems kind of
> fishy that strace could wake the process up.

Well, with the sporadic hanging/defunct processes at work my routine is
to send CONT to all Cygwin processes, then HUP/KILL to anything that's
still not live or gone and then another round of CONT.  This works
_most_ of the time, anything more stubborn I /bin/kill -f usually.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-21 19:20 ` Achim Gratz
@ 2018-03-21 22:09   ` Dan Kegel
  2018-03-22  7:38     ` Brian Inglis
  2018-03-22 17:33     ` Achim Gratz
  0 siblings, 2 replies; 10+ messages in thread
From: Dan Kegel @ 2018-03-21 22:09 UTC (permalink / raw)
  To: cygwin

On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
> Well, with the sporadic hanging/defunct processes at work my routine is
> to send CONT to all Cygwin processes, then HUP/KILL to anything that's
> still not live or gone and then another round of CONT.  This works
> _most_ of the time, anything more stubborn I /bin/kill -f usually.

Since I wrote, both bots hung again.  This time I verified:
0) both had the defunct git process as expected
1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's truly wedged.
2) the process stays in its spin hang until you ^C strace -p the-pid
3) ^C-ing the strace causes the process to terminate (strace alone doesn't)
4) taskkill /pid the-task /f also kills the process successfully.

So I could write a script that watched for defunct git processes
and taskkilled their parent.  Build jobs would fail, but at least
the bot would stay up.  Of course it would be much nicer if
the cygwin python process didn't get wedged.

Alternately, I suppose I could try running native python...
or cygwin's python3... but dangit, kill -9 should work.
- Dan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-21 22:09   ` Dan Kegel
@ 2018-03-22  7:38     ` Brian Inglis
  2018-03-22 17:10       ` Corinna Vinschen
  2018-03-22 17:33     ` Achim Gratz
  1 sibling, 1 reply; 10+ messages in thread
From: Brian Inglis @ 2018-03-22  7:38 UTC (permalink / raw)
  To: cygwin

On 2018-03-21 16:07, Dan Kegel wrote:
> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
>> Well, with the sporadic hanging/defunct processes at work my routine is
>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's
>> still not live or gone and then another round of CONT.  This works
>> _most_ of the time, anything more stubborn I /bin/kill -f usually.
> 
> Since I wrote, both bots hung again.  This time I verified:
> 0) both had the defunct git process as expected
> 1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's truly wedged.
> 2) the process stays in its spin hang until you ^C strace -p the-pid
> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't)
> 4) taskkill /pid the-task /f also kills the process successfully.
> 
> So I could write a script that watched for defunct git processes
> and taskkilled their parent.  Build jobs would fail, but at least
> the bot would stay up.  Of course it would be much nicer if
> the cygwin python process didn't get wedged.
> 
> Alternately, I suppose I could try running native python...
> or cygwin's python3... but dangit, kill -9 should work.

Seems to be looping on access failure to a Windows mailslot; not sure what this
feature is normally used for: dmesg/syslog messages/AF_UNIX sockets?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-22  7:38     ` Brian Inglis
@ 2018-03-22 17:10       ` Corinna Vinschen
  2018-03-22 21:05         ` Dan Kegel
  2018-03-24  1:54         ` Brian Inglis
  0 siblings, 2 replies; 10+ messages in thread
From: Corinna Vinschen @ 2018-03-22 17:10 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

On Mar 21 23:41, Brian Inglis wrote:
> On 2018-03-21 16:07, Dan Kegel wrote:
> > On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
> >> Well, with the sporadic hanging/defunct processes at work my routine is
> >> to send CONT to all Cygwin processes, then HUP/KILL to anything that's
> >> still not live or gone and then another round of CONT.  This works
> >> _most_ of the time, anything more stubborn I /bin/kill -f usually.
> > 
> > Since I wrote, both bots hung again.  This time I verified:
> > 0) both had the defunct git process as expected
> > 1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's truly wedged.
> > 2) the process stays in its spin hang until you ^C strace -p the-pid
> > 3) ^C-ing the strace causes the process to terminate (strace alone doesn't)
> > 4) taskkill /pid the-task /f also kills the process successfully.
> > 
> > So I could write a script that watched for defunct git processes
> > and taskkilled their parent.  Build jobs would fail, but at least
> > the bot would stay up.  Of course it would be much nicer if
> > the cygwin python process didn't get wedged.
> > 
> > Alternately, I suppose I could try running native python...
> > or cygwin's python3... but dangit, kill -9 should work.
> 
> Seems to be looping on access failure to a Windows mailslot; not sure what this
> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets?

/dev/kmsg is implemented using a mailslot under the hood.  This
feature is only used to log exceptions and for nothing else since
nobody ever found another reason to use it for.

It would be interesting to learn if the perl hangs have the same reason.

I guess we can simply remove /dev/kmsg support completely and drop
the mailslot code.  I'm pretty sure nobody would miss it.  Hardly
anybody knows it exists...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-21 22:09   ` Dan Kegel
  2018-03-22  7:38     ` Brian Inglis
@ 2018-03-22 17:33     ` Achim Gratz
  1 sibling, 0 replies; 10+ messages in thread
From: Achim Gratz @ 2018-03-22 17:33 UTC (permalink / raw)
  To: cygwin

Dan Kegel writes:
> 1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's
> truly wedged.

No, you send CONT to all processes (like 'pkill -CONT .'), it seems that
this gets whatever was wedged enough of a kick to retry or time out and
then it (often, not always) it seems to start reaping processes again
and the <defuncts> start to vanish.  You can tell you have one of these
when just doing a ps takes much longer than usual when it hits those
processes.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-22 17:10       ` Corinna Vinschen
@ 2018-03-22 21:05         ` Dan Kegel
  2018-03-24  1:54         ` Brian Inglis
  1 sibling, 0 replies; 10+ messages in thread
From: Dan Kegel @ 2018-03-22 21:05 UTC (permalink / raw)
  To: cygwin

On Thu, Mar 22, 2018 at 9:59 AM, Corinna Vinschen
<corinna-cygwin@cygwin.com> wrote:
> I guess we can simply remove /dev/kmsg support completely and drop
> the mailslot code.  I'm pretty sure nobody would miss it.  Hardly
> anybody knows it exists...

I'd be happy to test (though building cygwin is not in the cards for
me for now).
But would it break this guy's use case (running syslog-ng on cygwin)?
http://kb.kaminskiengineering.com/node/382
https://web.archive.org/web/20110209222803/http://www.syslog.org/logged/running-syslog-ng-on-windows/
- Dan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-22 17:10       ` Corinna Vinschen
  2018-03-22 21:05         ` Dan Kegel
@ 2018-03-24  1:54         ` Brian Inglis
  2018-03-25 18:12           ` Corinna Vinschen
  1 sibling, 1 reply; 10+ messages in thread
From: Brian Inglis @ 2018-03-24  1:54 UTC (permalink / raw)
  To: cygwin

On 2018-03-22 10:59, Corinna Vinschen wrote:
> On Mar 21 23:41, Brian Inglis wrote:
>> On 2018-03-21 16:07, Dan Kegel wrote:
>>> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
>>>> Well, with the sporadic hanging/defunct processes at work my routine is
>>>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's
>>>> still not live or gone and then another round of CONT.  This works
>>>> _most_ of the time, anything more stubborn I /bin/kill -f usually.
>>>
>>> Since I wrote, both bots hung again.  This time I verified:
>>> 0) both had the defunct git process as expected
>>> 1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's truly wedged.
>>> 2) the process stays in its spin hang until you ^C strace -p the-pid
>>> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't)
>>> 4) taskkill /pid the-task /f also kills the process successfully.
>>>
>>> So I could write a script that watched for defunct git processes
>>> and taskkilled their parent.  Build jobs would fail, but at least
>>> the bot would stay up.  Of course it would be much nicer if
>>> the cygwin python process didn't get wedged.
>>>
>>> Alternately, I suppose I could try running native python...
>>> or cygwin's python3... but dangit, kill -9 should work.
>>
>> Seems to be looping on access failure to a Windows mailslot; not sure what this
>> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets?
> 
> /dev/kmsg is implemented using a mailslot under the hood.  This
> feature is only used to log exceptions and for nothing else since
> nobody ever found another reason to use it for.
> 
> It would be interesting to learn if the perl hangs have the same reason.
> 
> I guess we can simply remove /dev/kmsg support completely and drop
> the mailslot code.  I'm pretty sure nobody would miss it.  Hardly
> anybody knows it exists...

Is /dev/log implemented the same way?

Looks like syslog-ng stopped working around the last upgrade:

$ cat /var/log/syslog-ng.log
Error reading serialized data; error='Error reading file (short read)'
Persistent configuration file is in invalid format, ignoring;
Error binding socket; addr='AF_UNIX(/dev/log)', error='Address already in use (112)'
Error initializing source driver; source='s_local', id='s_local#0'
Error initializing message pipeline;
... [repeats]

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-24  1:54         ` Brian Inglis
@ 2018-03-25 18:12           ` Corinna Vinschen
  2018-03-25 22:16             ` Brian Inglis
  0 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2018-03-25 18:12 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2644 bytes --]

On Mar 23 19:21, Brian Inglis wrote:
> On 2018-03-22 10:59, Corinna Vinschen wrote:
> > On Mar 21 23:41, Brian Inglis wrote:
> >> On 2018-03-21 16:07, Dan Kegel wrote:
> >>> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
> >>>> Well, with the sporadic hanging/defunct processes at work my routine is
> >>>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's
> >>>> still not live or gone and then another round of CONT.  This works
> >>>> _most_ of the time, anything more stubborn I /bin/kill -f usually.
> >>>
> >>> Since I wrote, both bots hung again.  This time I verified:
> >>> 0) both had the defunct git process as expected
> >>> 1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's truly wedged.
> >>> 2) the process stays in its spin hang until you ^C strace -p the-pid
> >>> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't)
> >>> 4) taskkill /pid the-task /f also kills the process successfully.
> >>>
> >>> So I could write a script that watched for defunct git processes
> >>> and taskkilled their parent.  Build jobs would fail, but at least
> >>> the bot would stay up.  Of course it would be much nicer if
> >>> the cygwin python process didn't get wedged.
> >>>
> >>> Alternately, I suppose I could try running native python...
> >>> or cygwin's python3... but dangit, kill -9 should work.
> >>
> >> Seems to be looping on access failure to a Windows mailslot; not sure what this
> >> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets?
> > 
> > /dev/kmsg is implemented using a mailslot under the hood.  This
> > feature is only used to log exceptions and for nothing else since
> > nobody ever found another reason to use it for.
> > 
> > It would be interesting to learn if the perl hangs have the same reason.
> > 
> > I guess we can simply remove /dev/kmsg support completely and drop
> > the mailslot code.  I'm pretty sure nobody would miss it.  Hardly
> > anybody knows it exists...
> 
> Is /dev/log implemented the same way?

No.  /dev/log is a AF_UNIX socket.

> Looks like syslog-ng stopped working around the last upgrade:
> 
> $ cat /var/log/syslog-ng.log
> Error reading serialized data; error='Error reading file (short read)'
> Persistent configuration file is in invalid format, ignoring;
> Error binding socket; addr='AF_UNIX(/dev/log)', error='Address already in use (112)'

rm -rf /dev/log


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd hang in python waiting for child; strace wakes hung process?
  2018-03-25 18:12           ` Corinna Vinschen
@ 2018-03-25 22:16             ` Brian Inglis
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Inglis @ 2018-03-25 22:16 UTC (permalink / raw)
  To: cygwin

On 2018-03-25 04:24, Corinna Vinschen wrote:
> On Mar 23 19:21, Brian Inglis wrote:
>> On 2018-03-22 10:59, Corinna Vinschen wrote:
>>> On Mar 21 23:41, Brian Inglis wrote:
>>>> On 2018-03-21 16:07, Dan Kegel wrote:
>>>>> On Wed, Mar 21, 2018 at 11:54 AM, Achim Gratz <Stromeko@nexgo.de> wrote:
>>>>>> Well, with the sporadic hanging/defunct processes at work my routine is
>>>>>> to send CONT to all Cygwin processes, then HUP/KILL to anything that's
>>>>>> still not live or gone and then another round of CONT.  This works
>>>>>> _most_ of the time, anything more stubborn I /bin/kill -f usually.
>>>>>
>>>>> Since I wrote, both bots hung again.  This time I verified:
>>>>> 0) both had the defunct git process as expected
>>>>> 1) kill -CONT the-pid doesn't do anything, nor does kill -9.  It's truly wedged.
>>>>> 2) the process stays in its spin hang until you ^C strace -p the-pid
>>>>> 3) ^C-ing the strace causes the process to terminate (strace alone doesn't)
>>>>> 4) taskkill /pid the-task /f also kills the process successfully.
>>>>>
>>>>> So I could write a script that watched for defunct git processes
>>>>> and taskkilled their parent.  Build jobs would fail, but at least
>>>>> the bot would stay up.  Of course it would be much nicer if
>>>>> the cygwin python process didn't get wedged.
>>>>>
>>>>> Alternately, I suppose I could try running native python...
>>>>> or cygwin's python3... but dangit, kill -9 should work.
>>>>
>>>> Seems to be looping on access failure to a Windows mailslot; not sure what this
>>>> feature is normally used for: dmesg/syslog messages/AF_UNIX sockets?
>>>
>>> /dev/kmsg is implemented using a mailslot under the hood.  This
>>> feature is only used to log exceptions and for nothing else since
>>> nobody ever found another reason to use it for.
>>>
>>> It would be interesting to learn if the perl hangs have the same reason.
>>>
>>> I guess we can simply remove /dev/kmsg support completely and drop
>>> the mailslot code.  I'm pretty sure nobody would miss it.  Hardly
>>> anybody knows it exists...
>>
>> Is /dev/log implemented the same way?
> 
> No.  /dev/log is a AF_UNIX socket.
> 
>> Looks like syslog-ng stopped working around the last upgrade:
>>
>> $ cat /var/log/syslog-ng.log
>> Error reading serialized data; error='Error reading file (short read)'
>> Persistent configuration file is in invalid format, ignoring;
>> Error binding socket; addr='AF_UNIX(/dev/log)', error='Address already in use (112)'
> 
> rm -rf /dev/log

$ ll /dev/log && /bin/rm -f /dev/log && ll /dev/log
-rw-rw-rw- 1 SYSTEM SYSTEM 54 Nov 24 20:59 /dev/log
ls: cannot access '/dev/log': No such file or directory
[start services]
$ cyg-srv-status.sh
cron Running   cygserver Running   sendmail Running   sshd Running   syslog-ng
Running

Thank you very much!

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-25 18:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21 17:55 Odd hang in python waiting for child; strace wakes hung process? Dan Kegel
2018-03-21 19:20 ` Achim Gratz
2018-03-21 22:09   ` Dan Kegel
2018-03-22  7:38     ` Brian Inglis
2018-03-22 17:10       ` Corinna Vinschen
2018-03-22 21:05         ` Dan Kegel
2018-03-24  1:54         ` Brian Inglis
2018-03-25 18:12           ` Corinna Vinschen
2018-03-25 22:16             ` Brian Inglis
2018-03-22 17:33     ` Achim Gratz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).