public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
@ 2016-08-11 13:39 Erik Bray
  2016-08-11 16:55 ` Corinna Vinschen
  0 siblings, 1 reply; 5+ messages in thread
From: Erik Bray @ 2016-08-11 13:39 UTC (permalink / raw)
  To: cygwin

Hi all,

This is a followup to a report back in 2011 about essentially the same issue:

https://cygwin.com/ml/cygwin/2011-04/msg00031.html

The same test program in that report demonstrates the issue, but with
kill sending any non-zero signal.  To reiterate, the problem here is
POSIX compliance with respect to sending signals to zombie processes:

> Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
> terminated process that has not been waited for by its parent). Some indicate success on such a
> call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
> of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
> as described is inappropriate in this case. In particular, this means that an application cannot
> have a parent process check for termination of a particular child with kill(). (Usually this is done
> with the null signal; this can be done reliably with waitpid().)

In response to the originally issue, this was fixed *specifically* for
the case of kill(pid, 0).  But my reading of the above is that kill()
should return 0 in this case regardless of the signal (modulo
permissions, etc.).  On Linux, for example, when calling kill with pid
of a zombie process the kernel will happily deliver the signal to the
relevant task_struct; it will just never be acted on since the task
will never run again.

The below (untested) patch demonstrates the change I'm suggesting,
though I don't know what other code, if any, might be involved in
this.

Please CC me on any replies since I'm not subscribed to the list.

Thanks,
Erik

diff --git a/winsup/cygwin/signal.cc b/winsup/cygwin/signal.cc
index ff101e3..d819e77 100644
--- a/winsup/cygwin/signal.cc
+++ b/winsup/cygwin/signal.cc
@@ -260,7 +260,7 @@ _pinfo::kill (siginfo_t& si)
        }
       this_pid = pid;
     }
-  else if (si.si_signo == 0 && this && process_state == PID_EXITED)
+  else if (this && process_state == PID_EXITED)
     {
       this_process_state = process_state;
       this_pid = pid;

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
  2016-08-11 13:39 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0 Erik Bray
@ 2016-08-11 16:55 ` Corinna Vinschen
  2016-08-11 17:26   ` Corinna Vinschen
  2016-08-12 10:39   ` Erik Bray
  0 siblings, 2 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-08-11 16:55 UTC (permalink / raw)
  To: cygwin; +Cc: Erik Bray

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

Hi Eric,

On Aug 11 11:51, Erik Bray wrote:
> [...]
> > Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
> > terminated process that has not been waited for by its parent). Some indicate success on such a
> > call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
> > of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
> > as described is inappropriate in this case. In particular, this means that an application cannot
> > have a parent process check for termination of a particular child with kill(). (Usually this is done
> > with the null signal; this can be done reliably with waitpid().)
> 
> In response to the originally issue, this was fixed *specifically* for
> the case of kill(pid, 0).  But my reading of the above is that kill()
> should return 0 in this case regardless of the signal (modulo
> permissions, etc.).  On Linux, for example, when calling kill with pid
> of a zombie process the kernel will happily deliver the signal to the
> relevant task_struct; it will just never be acted on since the task
> will never run again.

I'm not sure why cgf only fixed that for sig 0 at the time, since, as
you noted, the text from POSIX-1.2008 does not state that this is
*restricted* to sig 0.

> The below (untested) patch demonstrates the change I'm suggesting,
> though I don't know what other code, if any, might be involved in
> this.

The original patch laid the groundwork by making sure that there are
two states, EXITED and REAPED.  Removing the explicit check for 0 is
the right thing to do, afaics, so I tested and applied your patch as is,
see 
https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
  2016-08-11 16:55 ` Corinna Vinschen
@ 2016-08-11 17:26   ` Corinna Vinschen
  2016-08-12 10:39   ` Erik Bray
  1 sibling, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-08-11 17:26 UTC (permalink / raw)
  To: cygwin, Erik Bray

[-- Attachment #1: Type: text/plain, Size: 2332 bytes --]

On Aug 11 16:13, Corinna Vinschen wrote:
> Hi Eric,

Oops, Eri*k*.


Sorry,
Corinna

> 
> On Aug 11 11:51, Erik Bray wrote:
> > [...]
> > > Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
> > > terminated process that has not been waited for by its parent). Some indicate success on such a
> > > call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
> > > of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
> > > as described is inappropriate in this case. In particular, this means that an application cannot
> > > have a parent process check for termination of a particular child with kill(). (Usually this is done
> > > with the null signal; this can be done reliably with waitpid().)
> > 
> > In response to the originally issue, this was fixed *specifically* for
> > the case of kill(pid, 0).  But my reading of the above is that kill()
> > should return 0 in this case regardless of the signal (modulo
> > permissions, etc.).  On Linux, for example, when calling kill with pid
> > of a zombie process the kernel will happily deliver the signal to the
> > relevant task_struct; it will just never be acted on since the task
> > will never run again.
> 
> I'm not sure why cgf only fixed that for sig 0 at the time, since, as
> you noted, the text from POSIX-1.2008 does not state that this is
> *restricted* to sig 0.
> 
> > The below (untested) patch demonstrates the change I'm suggesting,
> > though I don't know what other code, if any, might be involved in
> > this.
> 
> The original patch laid the groundwork by making sure that there are
> two states, EXITED and REAPED.  Removing the explicit check for 0 is
> the right thing to do, afaics, so I tested and applied your patch as is,
> see 
> https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d
> 
> 
> Thanks,
> Corinna
> 
> -- 
> Corinna Vinschen                  Please, send mails regarding Cygwin to
> Cygwin Maintainer                 cygwin AT cygwin DOT com
> Red Hat



-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
  2016-08-11 16:55 ` Corinna Vinschen
  2016-08-11 17:26   ` Corinna Vinschen
@ 2016-08-12 10:39   ` Erik Bray
  2016-08-12 11:05     ` Corinna Vinschen
  1 sibling, 1 reply; 5+ messages in thread
From: Erik Bray @ 2016-08-12 10:39 UTC (permalink / raw)
  To: cygwin, Erik Bray

On Thu, Aug 11, 2016 at 4:13 PM, Corinna Vinschen
<corinna-cygwin@cygwin.com> wrote:
> Hi Eric,
>
> On Aug 11 11:51, Erik Bray wrote:
>> [...]
>> > Existing implementations vary on the result of a kill() with pid indicating an inactive process (a
>> > terminated process that has not been waited for by its parent). Some indicate success on such a
>> > call (subject to permission checking), while others give an error of [ESRCH]. Since the definition
>> > of process lifetime in this volume of POSIX.1-2008 covers inactive processes, the [ESRCH] error
>> > as described is inappropriate in this case. In particular, this means that an application cannot
>> > have a parent process check for termination of a particular child with kill(). (Usually this is done
>> > with the null signal; this can be done reliably with waitpid().)
>>
>> In response to the originally issue, this was fixed *specifically* for
>> the case of kill(pid, 0).  But my reading of the above is that kill()
>> should return 0 in this case regardless of the signal (modulo
>> permissions, etc.).  On Linux, for example, when calling kill with pid
>> of a zombie process the kernel will happily deliver the signal to the
>> relevant task_struct; it will just never be acted on since the task
>> will never run again.
>
> I'm not sure why cgf only fixed that for sig 0 at the time, since, as
> you noted, the text from POSIX-1.2008 does not state that this is
> *restricted* to sig 0.

Yep, my thoughts exactly.

>> The below (untested) patch demonstrates the change I'm suggesting,
>> though I don't know what other code, if any, might be involved in
>> this.
>
> The original patch laid the groundwork by making sure that there are
> two states, EXITED and REAPED.  Removing the explicit check for 0 is
> the right thing to do, afaics, so I tested and applied your patch as is,
> see
> https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d

Thank you so much!  I think this is much better behavior.

(And thanks for the correction of my name--I'm used to it being
misspelled but rarely do people notice and correct :)

Best,
Erik

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0
  2016-08-12 10:39   ` Erik Bray
@ 2016-08-12 11:05     ` Corinna Vinschen
  0 siblings, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-08-12 11:05 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]

Hi Erik,

On Aug 12 10:44, Erik Bray wrote:
> On Thu, Aug 11, 2016 at 4:13 PM, Corinna Vinschen
> <corinna-cygwin@cygwin.com> wrote:
> > On Aug 11 11:51, Erik Bray wrote:
> >> [...]
> >> In response to the originally issue, this was fixed *specifically* for
> >> the case of kill(pid, 0).  But my reading of the above is that kill()
> >> should return 0 in this case regardless of the signal (modulo
> >> permissions, etc.).  On Linux, for example, when calling kill with pid
> >> of a zombie process the kernel will happily deliver the signal to the
> >> relevant task_struct; it will just never be acted on since the task
> >> will never run again.
> >
> > I'm not sure why cgf only fixed that for sig 0 at the time, since, as
> > you noted, the text from POSIX-1.2008 does not state that this is
> > *restricted* to sig 0.
> 
> Yep, my thoughts exactly.

Funny that nobody noticed at the time, even thought the full POSIX text
was quoted.

> >> The below (untested) patch demonstrates the change I'm suggesting,
> >> though I don't know what other code, if any, might be involved in
> >> this.
> >
> > The original patch laid the groundwork by making sure that there are
> > two states, EXITED and REAPED.  Removing the explicit check for 0 is
> > the right thing to do, afaics, so I tested and applied your patch as is,
> > see
> > https://cygwin.com/git/?p=newlib-cygwin.git;a=commitdiff;h=86f79af827729f3968d8b3b8f860ac29d200da0d
> 
> Thank you so much!  I think this is much better behavior.
> 
> (And thanks for the correction of my name--I'm used to it being
> misspelled but rarely do people notice and correct :)

I belong to the misspelled kind myself (Corrina?  No, Corinna!  Can't
you read?), so I was quite embarrassed when I did it myself :}


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-08-12 10:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 13:39 2.5.1: kill(pid, sig) before waitpid() returns -1 for sig != 0 Erik Bray
2016-08-11 16:55 ` Corinna Vinschen
2016-08-11 17:26   ` Corinna Vinschen
2016-08-12 10:39   ` Erik Bray
2016-08-12 11:05     ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).