* 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
@ 2016-08-11 13:39 Erik Bray
2016-08-11 16:55 ` Corinna Vinschen
0 siblings, 1 reply; 5+ messages in thread
From: Erik Bray @ 2016-08-11 13:39 UTC (permalink / raw)
To: cygwin
Hi all,
This is a followup to a report back in 2011 about essentially the same issue:
https://cygwin.com/ml/cygwin/2011-04/msg00031.html
The same test program in that report demonstrates the issue, but with
kill sending any non-zero signal. To reiterate, the problem here is
POSIX compliance with respect to sending signals to zombie processes:
> Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
> terminated process that has not been waited for by its parent). Some indicate success on such a
> call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
> of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
> as described is inappropriate in this case. In particular, this means that an application cannot
> have a parent process check for termination of a particular child with kill(). (Usually this is done
> with the null signal; this can be done reliably with waitpid().)
In response to the originally issue, this was fixed *specifically* for
the case of kill(pid, 0). But my reading of the above is that kill()
should return 0 in this case regardless of the signal (modulo
permissions, etc.). On Linux, for example, when calling kill with pid
of a zombie process the kernel will happily deliver the signal to the
relevant task_struct; it will just never be acted on since the task
will never run again.
The below (untested) patch demonstrates the change I'm suggesting,
though I don't know what other code, if any, might be involved in
this.
Please CC me on any replies since I'm not subscribed to the list.
Thanks,
Erik
diff --git a/winsup/cygwin/signal.cc b/winsup/cygwin/signal.cc
index ff101e3..d819e77 100644
--- a/winsup/cygwin/signal.cc
+++ b/winsup/cygwin/signal.cc
@@ -260,7 +260,7 @@ _pinfo::kill (siginfo_t& si)
}
this_pid = pid;
}
- else if (si.si_signo == 0 && this && process_state == PID_EXITED)
+ else if (this && process_state == PID_EXITED)
{
this_process_state = process_state;
this_pid = pid;
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
2016-08-11 13:39 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0 Erik Bray
@ 2016-08-11 16:55 ` Corinna Vinschen
2016-08-11 17:26 ` Corinna Vinschen
2016-08-12 10:39 ` Erik Bray
0 siblings, 2 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-08-11 16:55 UTC (permalink / raw)
To: cygwin; +Cc: Erik Bray
[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]
Hi Eric,
On Aug 11 11:51, Erik Bray wrote:
> [...]
> > Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
> > terminated process that has not been waited for by its parent). Some indicate success on such a
> > call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
> > of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
> > as described is inappropriate in this case. In particular, this means that an application cannot
> > have a parent process check for termination of a particular child with kill(). (Usually this is done
> > with the null signal; this can be done reliably with waitpid().)
>
> In response to the originally issue, this was fixed *specifically* for
> the case of kill(pid, 0). But my reading of the above is that kill()
> should return 0 in this case regardless of the signal (modulo
> permissions, etc.). On Linux, for example, when calling kill with pid
> of a zombie process the kernel will happily deliver the signal to the
> relevant task_struct; it will just never be acted on since the task
> will never run again.
I'm not sure why cgf only fixed that for sig 0 at the time, since, as
you noted, the text from POSIX-1.2008 does not state that this is
*restricted* to sig 0.
> The below (untested) patch demonstrates the change I'm suggesting,
> though I don't know what other code, if any, might be involved in
> this.
The original patch laid the groundwork by making sure that there are
two states, EXITED and REAPED. Removing the explicit check for 0 is
the right thing to do, afaics, so I tested and applied your patch as is,
see
https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
2016-08-11 16:55 ` Corinna Vinschen
@ 2016-08-11 17:26 ` Corinna Vinschen
2016-08-12 10:39 ` Erik Bray
1 sibling, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-08-11 17:26 UTC (permalink / raw)
To: cygwin, Erik Bray
[-- Attachment #1: Type: text/plain, Size: 2332 bytes --]
On Aug 11 16:13, Corinna Vinschen wrote:
> Hi Eric,
Oops, Eri*k*.
Sorry,
Corinna
>
> On Aug 11 11:51, Erik Bray wrote:
> > [...]
> > > Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
> > > terminated process that has not been waited for by its parent). Some indicate success on such a
> > > call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
> > > of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
> > > as described is inappropriate in this case. In particular, this means that an application cannot
> > > have a parent process check for termination of a particular child with kill(). (Usually this is done
> > > with the null signal; this can be done reliably with waitpid().)
> >
> > In response to the originally issue, this was fixed *specifically* for
> > the case of kill(pid, 0). But my reading of the above is that kill()
> > should return 0 in this case regardless of the signal (modulo
> > permissions, etc.). On Linux, for example, when calling kill with pid
> > of a zombie process the kernel will happily deliver the signal to the
> > relevant task_struct; it will just never be acted on since the task
> > will never run again.
>
> I'm not sure why cgf only fixed that for sig 0 at the time, since, as
> you noted, the text from POSIX-1.2008 does not state that this is
> *restricted* to sig 0.
>
> > The below (untested) patch demonstrates the change I'm suggesting,
> > though I don't know what other code, if any, might be involved in
> > this.
>
> The original patch laid the groundwork by making sure that there are
> two states, EXITED and REAPED. Removing the explicit check for 0 is
> the right thing to do, afaics, so I tested and applied your patch as is,
> see
> https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d
>
>
> Thanks,
> Corinna
>
> --
> Corinna Vinschen Please, send mails regarding Cygwin to
> Cygwin Maintainer cygwin AT cygwin DOT com
> Red Hat
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
2016-08-11 16:55 ` Corinna Vinschen
2016-08-11 17:26 ` Corinna Vinschen
@ 2016-08-12 10:39 ` Erik Bray
2016-08-12 11:05 ` Corinna Vinschen
1 sibling, 1 reply; 5+ messages in thread
From: Erik Bray @ 2016-08-12 10:39 UTC (permalink / raw)
To: cygwin, Erik Bray
On Thu, Aug 11, 2016 at 4:13 PM, Corinna Vinschen
<corinna-cygwin@cygwin.com> wrote:
> Hi Eric,
>
> On Aug 11 11:51, Erik Bray wrote:
>> [...]
>> > Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
>> > terminated process that has not been waited for by its parent). Some indicate success on such a
>> > call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
>> > of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
>> > as described is inappropriate in this case. In particular, this means that an application cannot
>> > have a parent process check for termination of a particular child with kill(). (Usually this is done
>> > with the null signal; this can be done reliably with waitpid().)
>>
>> In response to the originally issue, this was fixed *specifically* for
>> the case of kill(pid, 0). But my reading of the above is that kill()
>> should return 0 in this case regardless of the signal (modulo
>> permissions, etc.). On Linux, for example, when calling kill with pid
>> of a zombie process the kernel will happily deliver the signal to the
>> relevant task_struct; it will just never be acted on since the task
>> will never run again.
>
> I'm not sure why cgf only fixed that for sig 0 at the time, since, as
> you noted, the text from POSIX-1.2008 does not state that this is
> *restricted* to sig 0.
Yep, my thoughts exactly.
>> The below (untested) patch demonstrates the change I'm suggesting,
>> though I don't know what other code, if any, might be involved in
>> this.
>
> The original patch laid the groundwork by making sure that there are
> two states, EXITED and REAPED. Removing the explicit check for 0 is
> the right thing to do, afaics, so I tested and applied your patch as is,
> see
> https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d
Thank you so much! I think this is much better behavior.
(And thanks for the correction of my name--I'm used to it being
misspelled but rarely do people notice and correct :)
Best,
Erik
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
2016-08-12 10:39 ` Erik Bray
@ 2016-08-12 11:05 ` Corinna Vinschen
0 siblings, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-08-12 11:05 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]
Hi Erik,
On Aug 12 10:44, Erik Bray wrote:
> On Thu, Aug 11, 2016 at 4:13 PM, Corinna Vinschen
> <corinna-cygwin@cygwin.com> wrote:
> > On Aug 11 11:51, Erik Bray wrote:
> >> [...]
> >> In response to the originally issue, this was fixed *specifically* for
> >> the case of kill(pid, 0). But my reading of the above is that kill()
> >> should return 0 in this case regardless of the signal (modulo
> >> permissions, etc.). On Linux, for example, when calling kill with pid
> >> of a zombie process the kernel will happily deliver the signal to the
> >> relevant task_struct; it will just never be acted on since the task
> >> will never run again.
> >
> > I'm not sure why cgf only fixed that for sig 0 at the time, since, as
> > you noted, the text from POSIX-1.2008 does not state that this is
> > *restricted* to sig 0.
>
> Yep, my thoughts exactly.
Funny that nobody noticed at the time, even thought the full POSIX text
was quoted.
> >> The below (untested) patch demonstrates the change I'm suggesting,
> >> though I don't know what other code, if any, might be involved in
> >> this.
> >
> > The original patch laid the groundwork by making sure that there are
> > two states, EXITED and REAPED. Removing the explicit check for 0 is
> > the right thing to do, afaics, so I tested and applied your patch as is,
> > see
> > https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d
>
> Thank you so much! I think this is much better behavior.
>
> (And thanks for the correction of my name--I'm used to it being
> misspelled but rarely do people notice and correct :)
I belong to the misspelled kind myself (Corrina? No, Corinna! Can't
you read?), so I was quite embarrassed when I did it myself :}
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-08-12 10:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 13:39 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0 Erik Bray
2016-08-11 16:55 ` Corinna Vinschen
2016-08-11 17:26 ` Corinna Vinschen
2016-08-12 10:39 ` Erik Bray
2016-08-12 11:05 ` Corinna Vinschen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).