Re: bug#14569: 24.3.50; bootstrap fails on Cygwin

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
       [not found]                                 ` <BEC82502-E9FD-4F8E-B91E-F680F6885FB2@swipnet.se>
@ 2013-06-14 17:45                                   ` Paul Eggert
  2013-06-14 18:04                                     ` Christopher Faylor
  2013-06-14 18:16                                     ` Eli Zaretskii
  0 siblings, 2 replies; 10+ messages in thread
From: Paul Eggert @ 2013-06-14 17:45 UTC (permalink / raw)
  To: cygwin; +Cc: 14569

Cygwin developers, I'm worried about a Cygwin bug where
pthread_kill may not send a signal to the correct thread.
This bug may be causing Emacs to crash.  The Cygwin bug is
discussed in this thread:

http://cygwin.com/ml/cygwin/2012-05/msg00472.html

Emacs uses pthread_kill to redirect
SIGCHLD to the main thread; if this is sent to a random
thread instead, that could explain the random crashes.

My question is: does this bug still exist with Cygwin,
and if so is it likely to get fixed soon?

More details about the Emacs bug can be found here:

  http://bugs.gnu.org/14569

Briefly, Emacs is crashing randomly on Cygwin ever since it started
doing this:

  /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
     this should always fail, but is enough to initialize glib's            
     private SIGCHLD handler.  */
  g_source_unref (g_child_watch_source_new (getpid ()));

After this newly-inserted code, Emacs finds out what the
child signal handler was:

  /* Now, find out what glib's signal handler was, and store it
     into lib_child_handler.  */
  struct sigaction action, old_action;
  emacs_sigaction_init (&action, deliver_child_signal);
  sigaction (SIGCHLD, &action, &old_action);
  eassert (! (old_action.sa_flags & SA_SIGINFO));
  if (old_action.sa_handler != SIG_DFL && old_action.sa_handler != SIG_IGN
      && old_action.sa_handler != deliver_child_signal)
    lib_child_handler = old_action.sa_handler;

Emacs's SIGCHILD handler, deliver_child_signal, arranges the
signal handling to occur in the main thread (to avoid races
within Emacs), like this:

  int old_errno = errno;
  bool on_main_thread = true;
  if (! pthread_equal (pthread_self (), main_thread))
    {
      sigset_t blocked;
      sigemptyset (&blocked);
      sigaddset (&blocked, sig);
      pthread_sigmask (SIG_BLOCK, &blocked, 0);
      pthread_kill (main_thread, sig);
      on_main_thread = false;
    }
  if (on_main_thread)
    handle_child_signal (sig);
  errno = old_errno;

And handle_child_signal, which runs in the main thread, does
a bunch of Emacsish things and then invokes lib_child_handler (sig),
which is glib's SIGCHLD handler.

All this works just fine on Fedora and other platforms; but it
doesn't work on Cygwin.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-14 17:45                                   ` bug#14569: 24.3.50; bootstrap fails on Cygwin Paul Eggert
@ 2013-06-14 18:04                                     ` Christopher Faylor
  2013-06-14 20:30                                       ` Ken Brown
  2013-06-15  6:01                                       ` Paul Eggert
  2013-06-14 18:16                                     ` Eli Zaretskii
  1 sibling, 2 replies; 10+ messages in thread
From: Christopher Faylor @ 2013-06-14 18:04 UTC (permalink / raw)
  To: cygwin

On Fri, Jun 14, 2013 at 10:45:47AM -0700, Paul Eggert wrote:
>Cygwin developers, I'm worried about a Cygwin bug where
>pthread_kill may not send a signal to the correct thread.
>This bug may be causing Emacs to crash.  The Cygwin bug is
>discussed in this thread:
>
>http://cygwin.com/ml/cygwin/2012-05/msg00472.html
>
>Emacs uses pthread_kill to redirect
>SIGCHLD to the main thread; if this is sent to a random
>thread instead, that could explain the random crashes.
>
>My question is: does this bug still exist with Cygwin,
>and if so is it likely to get fixed soon?

You pointed to an archived mail messages which implies that was fixed
more than a year ago.  What makes you think it is still a problem?

I'd expect that if it was still a problem our emacs maintainer would
be on top of it.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-14 17:45                                   ` bug#14569: 24.3.50; bootstrap fails on Cygwin Paul Eggert
  2013-06-14 18:04                                     ` Christopher Faylor
@ 2013-06-14 18:16                                     ` Eli Zaretskii
  1 sibling, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2013-06-14 18:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: cygwin, 14569

> Date: Fri, 14 Jun 2013 10:45:47 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: 14569@debbugs.gnu.org
> 
> Cygwin developers, I'm worried about a Cygwin bug where
> pthread_kill may not send a signal to the correct thread.
> This bug may be causing Emacs to crash.  The Cygwin bug is
> discussed in this thread:
> 
> http://cygwin.com/ml/cygwin/2012-05/msg00472.html

Caveat: I'm not a Cygwin developer, and don't even use Cygwin.

> Emacs uses pthread_kill to redirect
> SIGCHLD to the main thread; if this is sent to a random
> thread instead, that could explain the random crashes.

It should be easy to instrument deliver_child_signal so that it prints
something when it redirects SIGCHLD, and then the Cygwin users could
see if there's such a report immediately before the crash, or at all.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-14 18:04                                     ` Christopher Faylor
@ 2013-06-14 20:30                                       ` Ken Brown
  2013-06-15  6:01                                       ` Paul Eggert
  1 sibling, 0 replies; 10+ messages in thread
From: Ken Brown @ 2013-06-14 20:30 UTC (permalink / raw)
  To: cygwin; +Cc: 14569

On 6/14/2013 2:03 PM, Christopher Faylor wrote:
> On Fri, Jun 14, 2013 at 10:45:47AM -0700, Paul Eggert wrote:
>> Cygwin developers, I'm worried about a Cygwin bug where
>> pthread_kill may not send a signal to the correct thread.
>> This bug may be causing Emacs to crash.  The Cygwin bug is
>> discussed in this thread:
>>
>> http://cygwin.com/ml/cygwin/2012-05/msg00472.html
>>
>> Emacs uses pthread_kill to redirect
>> SIGCHLD to the main thread; if this is sent to a random
>> thread instead, that could explain the random crashes.
>>
>> My question is: does this bug still exist with Cygwin,
>> and if so is it likely to get fixed soon?
>
> You pointed to an archived mail messages which implies that was fixed
> more than a year ago.  What makes you think it is still a problem?
>
> I'd expect that if it was still a problem our emacs maintainer would
> be on top of it.

Unfortunately, the emacs maintainer doesn't have any idea why the recent 
emacs changes are causing random crashes on Cygwin.  It's almost 
impossible to catch this under gdb; and the one time it was caught, the 
backtrace didn't make sense.  Also, the crash doesn't occur when emacs 
is run under strace.

I'm not going to speculate on whether the problem is caused by a bug in 
Cygwin's pthread_kill.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-14 18:04                                     ` Christopher Faylor
  2013-06-14 20:30                                       ` Ken Brown
@ 2013-06-15  6:01                                       ` Paul Eggert
  2013-06-15  6:21                                         ` Christopher Faylor
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2013-06-15  6:01 UTC (permalink / raw)
  To: cygwin

On 06/14/2013 11:03 AM, Christopher Faylor wrote:

> You pointed to an archived mail messages which implies that was fixed
> more than a year ago.  What makes you think it is still a problem?

The message I pointed to <http://cygwin.com/ml/cygwin/2012-05/msg00472.html>
says this:

  > Testcase signal/kill:
  > Signals may or may not reach the correct thread with 1.7.12-1 and newer.

  Confirmed.  I think the reason is that we only have a single event to
  signal that a POSIX signal arrived instead of a per-thread event, but
  I'm not sure.  This is cgf's domain so I leave it at that for now.

I interpreted this to mean "the existence of the bug is confirmed,
here's why the bug occurs, and I'll let cgf deal with it".
I didn't see any followup message where cgf (is that you?)
dealt with it.  My apologies if I misinterpreted the email.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-15  6:01                                       ` Paul Eggert
@ 2013-06-15  6:21                                         ` Christopher Faylor
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Faylor @ 2013-06-15  6:21 UTC (permalink / raw)
  To: cygwin; +Cc: Paul Eggert

On Fri, Jun 14, 2013 at 11:01:54PM -0700, Paul Eggert wrote:
>On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>You pointed to an archived mail messages which implies that was fixed
>>more than a year ago.  What makes you think it is still a problem?
>
>The message I pointed to
><http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>
>>Testcase signal/kill: Signals may or may not reach the correct thread
>>with 1.7.12-1 and newer.
>
>Confirmed.  I think the reason is that we only have a single event to
>signal that a POSIX signal arrived instead of a per-thread event, but
>I'm not sure.  This is cgf's domain so I leave it at that for now.
>
>I interpreted this to mean "the existence of the bug is confirmed,
>here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>see any followup message where cgf (is that you?) dealt with it.  My
>apologies if I misinterpreted the email.

Oops.  I didn't read Corinna's message as thoroughly as I should have.
Sorry.

That particular issue was supposed to have been fixed in Cygwin 1.7.17,
released in October 2012.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-16 15:01   ` Christopher Faylor
@ 2013-06-16 17:52     ` Ken Brown
  0 siblings, 0 replies; 10+ messages in thread
From: Ken Brown @ 2013-06-16 17:52 UTC (permalink / raw)
  To: cygwin; +Cc: 14569



On 6/16/2013 11:01 AM, Christopher Faylor wrote:
> On Sun, Jun 16, 2013 at 09:11:21AM -0400, Ken Brown wrote:
>> [Adding the bug address back to the CC so that this gets archived.]
>>
>> On 6/15/2013 9:54 AM, Angelo Graziosi wrote:
>>> Christopher Faylor wrote
>>>>> On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>>>> You pointed to an archived mail messages which implies that was fixed
>>>>>> more than a year ago.  What makes you think it is still a problem?
>>>>>
>>>>> The message I pointed to
>>>>> <http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>>>>
>>>>>> Testcase signal/kill: Signals may or may not reach the correct thread
>>>>>> with 1.7.12-1 and newer.
>>>>>
>>>>> Confirmed.  I think the reason is that we only have a single event to
>>>>> signal that a POSIX signal arrived instead of a per-thread event, but
>>>>> I'm not sure.  This is cgf's domain so I leave it at that for now.
>>>>>
>>>>> I interpreted this to mean "the existence of the bug is confirmed,
>>>>> here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>>>> see any followup message where cgf (is that you?) dealt with it.  My
>>>>> apologies if I misinterpreted the email.
>>>>
>>>> Oops.  I didn't read Corinna's message as thoroughly as I should have.
>>>> Sorry.
>>>>
>>>> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
>>>> released in October 2012.
>>>
>>> Out of curiosity, I tried the test cases I found in that thread, more
>>> precisely here:
>>>
>>>     http://cygwin.com/ml/cygwin/2012-05/msg00434.html
>>>
>>>
>>> and the results are:
>>>
>>> $ gcc otto_test1.c -o otto_test1
>>> $ ./otto_test1
>>> Testing deferred pthread_cancel()
>>>
>>> Thread 0 starting (0x200102c0)
>>> Thread 1 starting (0x20010360)
>>> Thread 2 starting (0x20010400)
>>>
>>> Cancelling thread 2 (0x20010400)
>>> Thread 2 exiting (0x20010400)
>>> Cancelling thread 1 (0x20010360)
>>> Thread 1 exiting (0x20010360)
>>> Cancelling thread 0 (0x200102c0)
>>> Thread 0 exiting (0x200102c0)
>>>
>>> Thread 0 is gone (0x200102c0)
>>> Thread 1 is gone (0x20010360)
>>> Thread 2 is gone (0x20010400)
>>>
>>> $ gcc otto_test2.c -o otto_test2
>>> $ ./otto_test2
>>> Testing asynchronous pthread_cancel()
>>>
>>> Thread 0 starting (0x200102c0)
>>> Changing canceltype from 0 to 1
>>> Thread 1 starting (0x20010360)
>>> Changing canceltype from 0 to 1
>>> Thread 2 starting (0x20010400)
>>> Changing canceltype from 0 to 1
>>>
>>> Cancelling thread 2 (0x20010400)
>>> Thread 2 exiting (0x20010400)
>>> Cancelling thread 1 (0x20010360)
>>> Thread 1 exiting (0x20010360)
>>> Cancelling thread 0 (0x200102c0)
>>> Thread 0 exiting (0x200102c0)
>>>
>>> Thread 0 is gone (0x200102c0)
>>> Thread 1 is gone (0x20010360)
>>> Thread 2 is gone (0x20010400)
>>>
>>> $ gcc otto_test3.c -o otto_test3
>>> $ ./otto_test3
>>> Testing pthread_kill()
>>>
>>> Thread 0 starting (0x200102c0)
>>> Thread 1 starting (0x20010360)
>>> Thread 2 starting (0x20010400)
>>>
>>> Sending SIGUSR1 to thread 2 (0x20010400)
>>> Thread 2 executes signal handler (0x20010400)
>>> Thread 2 encountered an error: Interrupted system call (0x20010400)
>>> Sending SIGUSR1 to thread 1 (0x20010360)
>>> Thread 1 executes signal handler (0x20010360)
>>> Thread 1 encountered an error: Interrupted system call (0x20010360)
>>> Sending SIGUSR1 to thread 0 (0x200102c0)
>>> Thread 0 executes signal handler (0x200102c0)
>>> Thread 0 encountered an error: Interrupted system call (0x200102c0)
>>>
>>> Are the errors in the last test case to be expected under the 20130612
>>> snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?
>>
>> I can replicate this on my system, consistently.  There's clearly a
>> problem, but it's not the same as in the original Cygwin bug report.  In
>> the present case, the signal is received by the right thread, but
>> something goes wrong afterwards.
>
> Try it on Linux.  I don't see any difference.  "An error" in this case
> seems to be the script working as designed.
>
> % man sem_wait
>
>      SEM_WAIT(3)   Linux Programmer's Manual      SEM_WAIT(3)
>
>
>
>      NAME
> 	   sem_wait, sem_timedwait, sem_trywait - lock a semaphore
>
>      ...
>
>      ERRORS
> 	   EINTR  The call was interrupted by a signal handler; see signal(7).

Yeah, I missed that.  Sorry for the noise.

Ken


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-16 13:12 ` Ken Brown
@ 2013-06-16 15:01   ` Christopher Faylor
  2013-06-16 17:52     ` Ken Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Christopher Faylor @ 2013-06-16 15:01 UTC (permalink / raw)
  To: cygwin

On Sun, Jun 16, 2013 at 09:11:21AM -0400, Ken Brown wrote:
>[Adding the bug address back to the CC so that this gets archived.]
>
>On 6/15/2013 9:54 AM, Angelo Graziosi wrote:
>> Christopher Faylor wrote
>>>> On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>>> You pointed to an archived mail messages which implies that was fixed
>>>>> more than a year ago.  What makes you think it is still a problem?
>>>>
>>>> The message I pointed to
>>>> <http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>>>
>>>>> Testcase signal/kill: Signals may or may not reach the correct thread
>>>>> with 1.7.12-1 and newer.
>>>>
>>>> Confirmed.  I think the reason is that we only have a single event to
>>>> signal that a POSIX signal arrived instead of a per-thread event, but
>>>> I'm not sure.  This is cgf's domain so I leave it at that for now.
>>>>
>>>> I interpreted this to mean "the existence of the bug is confirmed,
>>>> here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>>> see any followup message where cgf (is that you?) dealt with it.  My
>>>> apologies if I misinterpreted the email.
>>>
>>> Oops.  I didn't read Corinna's message as thoroughly as I should have.
>>> Sorry.
>>>
>>> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
>>> released in October 2012.
>>
>> Out of curiosity, I tried the test cases I found in that thread, more
>> precisely here:
>>
>>    http://cygwin.com/ml/cygwin/2012-05/msg00434.html
>>
>>
>> and the results are:
>>
>> $ gcc otto_test1.c -o otto_test1
>> $ ./otto_test1
>> Testing deferred pthread_cancel()
>>
>> Thread 0 starting (0x200102c0)
>> Thread 1 starting (0x20010360)
>> Thread 2 starting (0x20010400)
>>
>> Cancelling thread 2 (0x20010400)
>> Thread 2 exiting (0x20010400)
>> Cancelling thread 1 (0x20010360)
>> Thread 1 exiting (0x20010360)
>> Cancelling thread 0 (0x200102c0)
>> Thread 0 exiting (0x200102c0)
>>
>> Thread 0 is gone (0x200102c0)
>> Thread 1 is gone (0x20010360)
>> Thread 2 is gone (0x20010400)
>>
>> $ gcc otto_test2.c -o otto_test2
>> $ ./otto_test2
>> Testing asynchronous pthread_cancel()
>>
>> Thread 0 starting (0x200102c0)
>> Changing canceltype from 0 to 1
>> Thread 1 starting (0x20010360)
>> Changing canceltype from 0 to 1
>> Thread 2 starting (0x20010400)
>> Changing canceltype from 0 to 1
>>
>> Cancelling thread 2 (0x20010400)
>> Thread 2 exiting (0x20010400)
>> Cancelling thread 1 (0x20010360)
>> Thread 1 exiting (0x20010360)
>> Cancelling thread 0 (0x200102c0)
>> Thread 0 exiting (0x200102c0)
>>
>> Thread 0 is gone (0x200102c0)
>> Thread 1 is gone (0x20010360)
>> Thread 2 is gone (0x20010400)
>>
>> $ gcc otto_test3.c -o otto_test3
>> $ ./otto_test3
>> Testing pthread_kill()
>>
>> Thread 0 starting (0x200102c0)
>> Thread 1 starting (0x20010360)
>> Thread 2 starting (0x20010400)
>>
>> Sending SIGUSR1 to thread 2 (0x20010400)
>> Thread 2 executes signal handler (0x20010400)
>> Thread 2 encountered an error: Interrupted system call (0x20010400)
>> Sending SIGUSR1 to thread 1 (0x20010360)
>> Thread 1 executes signal handler (0x20010360)
>> Thread 1 encountered an error: Interrupted system call (0x20010360)
>> Sending SIGUSR1 to thread 0 (0x200102c0)
>> Thread 0 executes signal handler (0x200102c0)
>> Thread 0 encountered an error: Interrupted system call (0x200102c0)
>>
>> Are the errors in the last test case to be expected under the 20130612
>> snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?
>
>I can replicate this on my system, consistently.  There's clearly a 
>problem, but it's not the same as in the original Cygwin bug report.  In 
>the present case, the signal is received by the right thread, but 
>something goes wrong afterwards.

Try it on Linux.  I don't see any difference.  "An error" in this case
seems to be the script working as designed.

% man sem_wait

    SEM_WAIT(3)   Linux Programmer's Manual      SEM_WAIT(3)



    NAME
	   sem_wait, sem_timedwait, sem_trywait - lock a semaphore

    ...

    ERRORS
	   EINTR  The call was interrupted by a signal handler; see signal(7).

    ...

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
  2013-06-15 13:54 Angelo Graziosi
@ 2013-06-16 13:12 ` Ken Brown
  2013-06-16 15:01   ` Christopher Faylor
  0 siblings, 1 reply; 10+ messages in thread
From: Ken Brown @ 2013-06-16 13:12 UTC (permalink / raw)
  To: Angelo Graziosi; +Cc: cygwin, 14569, Paul Eggert

[-- Attachment #1: Type: text/plain, Size: 3609 bytes --]

[Adding the bug address back to the CC so that this gets archived.]

On 6/15/2013 9:54 AM, Angelo Graziosi wrote:
> Christopher Faylor wrote
>>> On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>> You pointed to an archived mail messages which implies that was fixed
>>>> more than a year ago.  What makes you think it is still a problem?
>>>
>>> The message I pointed to
>>> <http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>>
>>>> Testcase signal/kill: Signals may or may not reach the correct thread
>>>> with 1.7.12-1 and newer.
>>>
>>> Confirmed.  I think the reason is that we only have a single event to
>>> signal that a POSIX signal arrived instead of a per-thread event, but
>>> I'm not sure.  This is cgf's domain so I leave it at that for now.
>>>
>>> I interpreted this to mean "the existence of the bug is confirmed,
>>> here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>> see any followup message where cgf (is that you?) dealt with it.  My
>>> apologies if I misinterpreted the email.
>>
>> Oops.  I didn't read Corinna's message as thoroughly as I should have.
>> Sorry.
>>
>> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
>> released in October 2012.
>
> Out of curiosity, I tried the test cases I found in that thread, more
> precisely here:
>
>    http://cygwin.com/ml/cygwin/2012-05/msg00434.html
>
>
> and the results are:
>
> $ gcc otto_test1.c -o otto_test1
> $ ./otto_test1
> Testing deferred pthread_cancel()
>
> Thread 0 starting (0x200102c0)
> Thread 1 starting (0x20010360)
> Thread 2 starting (0x20010400)
>
> Cancelling thread 2 (0x20010400)
> Thread 2 exiting (0x20010400)
> Cancelling thread 1 (0x20010360)
> Thread 1 exiting (0x20010360)
> Cancelling thread 0 (0x200102c0)
> Thread 0 exiting (0x200102c0)
>
> Thread 0 is gone (0x200102c0)
> Thread 1 is gone (0x20010360)
> Thread 2 is gone (0x20010400)
>
> $ gcc otto_test2.c -o otto_test2
> $ ./otto_test2
> Testing asynchronous pthread_cancel()
>
> Thread 0 starting (0x200102c0)
> Changing canceltype from 0 to 1
> Thread 1 starting (0x20010360)
> Changing canceltype from 0 to 1
> Thread 2 starting (0x20010400)
> Changing canceltype from 0 to 1
>
> Cancelling thread 2 (0x20010400)
> Thread 2 exiting (0x20010400)
> Cancelling thread 1 (0x20010360)
> Thread 1 exiting (0x20010360)
> Cancelling thread 0 (0x200102c0)
> Thread 0 exiting (0x200102c0)
>
> Thread 0 is gone (0x200102c0)
> Thread 1 is gone (0x20010360)
> Thread 2 is gone (0x20010400)
>
> $ gcc otto_test3.c -o otto_test3
> $ ./otto_test3
> Testing pthread_kill()
>
> Thread 0 starting (0x200102c0)
> Thread 1 starting (0x20010360)
> Thread 2 starting (0x20010400)
>
> Sending SIGUSR1 to thread 2 (0x20010400)
> Thread 2 executes signal handler (0x20010400)
> Thread 2 encountered an error: Interrupted system call (0x20010400)
> Sending SIGUSR1 to thread 1 (0x20010360)
> Thread 1 executes signal handler (0x20010360)
> Thread 1 encountered an error: Interrupted system call (0x20010360)
> Sending SIGUSR1 to thread 0 (0x200102c0)
> Thread 0 executes signal handler (0x200102c0)
> Thread 0 encountered an error: Interrupted system call (0x200102c0)
>
> Are the errors in the last test case to be expected under the 20130612
> snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?

I can replicate this on my system, consistently.  There's clearly a 
problem, but it's not the same as in the original Cygwin bug report.  In 
the present case, the signal is received by the right thread, but 
something goes wrong afterwards.

I'm attaching the test case for ease of reference.

Ken

[-- Attachment #2: otto_test3.c --]
[-- Type: text/plain, Size: 2544 bytes --]

/* http://cygwin.com/ml/cygwin/2012-05/msg00434.html */

#include <errno.h>
#include <pthread.h>
#include <semaphore.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

pthread_t tids[3];
sem_t semaphore;

static void cleanup_handler(void *arg) {
  int *intptr = (int*)arg;
  pthread_t self = pthread_self();
  fprintf(stderr, "Thread %i exiting (%p)\n", *intptr, self);
}

static void* simplethread(void *arg) {
  int *intptr = (int*)arg;
  pthread_t self = pthread_self();
  fprintf(stderr, "Thread %i starting (%p)\n", *intptr, self);

  pthread_cleanup_push(&cleanup_handler, intptr);

  while (1) {
    if (sem_wait(&semaphore) != 0) {
      fprintf(stderr, "Thread %i encountered an error: %s (%p)\n",
          *intptr, strerror(errno), self);
    } else {
      fprintf(stderr, "Thread %i woke up just fine\n", *intptr);
    }
  }

  pthread_cleanup_pop(1);
  return NULL;
}

static void sigusr1_handler(int signal __attribute((unused))) {
  pthread_t self = pthread_self();
  int tnum = 0;
  while (tnum < 3) {
    if (tids[tnum] == self) {
      break;
    }
    tnum++;
  }

  fprintf(stderr, "Thread %i executes signal handler (%p)\n", tnum, self);
}

static void install_handler(void) {
  struct sigaction act;
  act.sa_handler = &sigusr1_handler;
  sigemptyset(&(act.sa_mask));
  act.sa_flags = 0;

  if (sigaction(SIGUSR1, &act, NULL) != 0) {
    fprintf(stderr, "Can't set signal handler: %s\n", strerror(errno));
    exit(1);
  }

  sigset_t sset;
  sigemptyset(&sset);
  sigaddset(&sset, SIGUSR1);
  if (sigprocmask(SIG_UNBLOCK, &sset, NULL) != 0) {
    fprintf(stderr, "Can't unblock SIGUSR1: %s\n", strerror(errno));
  }
}

int main() {
  fprintf(stderr, "Testing pthread_kill()\n\n");

  int i;
  int result;

  sem_init(&semaphore, 0, 0);
  install_handler();

  for (i=0; i<3; i++) {
    int *intptr = (int*)malloc(sizeof(int));
    *intptr = i;
    result = pthread_create(tids+i, NULL, &simplethread, intptr);
    if (result != 0) {
      fprintf(stderr, "Can't create thread: %s\n", strerror(result));
      return 1;
    }
  }

  sleep(1);
  install_handler();
  fprintf(stderr, "\n");

  int mainint = 42;
  pthread_cleanup_push(&cleanup_handler, &mainint);

  for (i=2; i>=0; i--) {
    fprintf(stderr, "Sending SIGUSR1 to thread %i (%p)\n", i, tids[i]);
    result = pthread_kill(tids[i], SIGUSR1);
    if (result != 0) {
      fprintf(stderr, "Error during pthread_kill: %s\n", strerror(result));
    }
    sleep(1);
  }

  pthread_cleanup_pop(0);
  return 0;
}

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
@ 2013-06-15 13:54 Angelo Graziosi
  2013-06-16 13:12 ` Ken Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Angelo Graziosi @ 2013-06-15 13:54 UTC (permalink / raw)
  To: Cygwin; +Cc: Paul Eggert, bug-emacs

Christopher Faylor wrote
>>On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>You pointed to an archived mail messages which implies that was fixed
>>>more than a year ago.  What makes you think it is still a problem?
>>
>>The message I pointed to
>><http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>
>>>Testcase signal/kill: Signals may or may not reach the correct thread
>>>with 1.7.12-1 and newer.
>>
>>Confirmed.  I think the reason is that we only have a single event to
>>signal that a POSIX signal arrived instead of a per-thread event, but
>>I'm not sure.  This is cgf's domain so I leave it at that for now.
>>
>>I interpreted this to mean "the existence of the bug is confirmed,
>>here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>see any followup message where cgf (is that you?) dealt with it.  My
>>apologies if I misinterpreted the email.
>
> Oops.  I didn't read Corinna's message as thoroughly as I should have.
> Sorry.
>
> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
> released in October 2012.

Out of curiosity, I tried the test cases I found in that thread, more 
precisely here:

   http://cygwin.com/ml/cygwin/2012-05/msg00434.html


and the results are:

$ gcc otto_test1.c -o otto_test1
$ ./otto_test1
Testing deferred pthread_cancel()

Thread 0 starting (0x200102c0)
Thread 1 starting (0x20010360)
Thread 2 starting (0x20010400)

Cancelling thread 2 (0x20010400)
Thread 2 exiting (0x20010400)
Cancelling thread 1 (0x20010360)
Thread 1 exiting (0x20010360)
Cancelling thread 0 (0x200102c0)
Thread 0 exiting (0x200102c0)

Thread 0 is gone (0x200102c0)
Thread 1 is gone (0x20010360)
Thread 2 is gone (0x20010400)

$ gcc otto_test2.c -o otto_test2
$ ./otto_test2
Testing asynchronous pthread_cancel()

Thread 0 starting (0x200102c0)
Changing canceltype from 0 to 1
Thread 1 starting (0x20010360)
Changing canceltype from 0 to 1
Thread 2 starting (0x20010400)
Changing canceltype from 0 to 1

Cancelling thread 2 (0x20010400)
Thread 2 exiting (0x20010400)
Cancelling thread 1 (0x20010360)
Thread 1 exiting (0x20010360)
Cancelling thread 0 (0x200102c0)
Thread 0 exiting (0x200102c0)

Thread 0 is gone (0x200102c0)
Thread 1 is gone (0x20010360)
Thread 2 is gone (0x20010400)

$ gcc otto_test3.c -o otto_test3
$ ./otto_test3
Testing pthread_kill()

Thread 0 starting (0x200102c0)
Thread 1 starting (0x20010360)
Thread 2 starting (0x20010400)

Sending SIGUSR1 to thread 2 (0x20010400)
Thread 2 executes signal handler (0x20010400)
Thread 2 encountered an error: Interrupted system call (0x20010400)
Sending SIGUSR1 to thread 1 (0x20010360)
Thread 1 executes signal handler (0x20010360)
Thread 1 encountered an error: Interrupted system call (0x20010360)
Sending SIGUSR1 to thread 0 (0x200102c0)
Thread 0 executes signal handler (0x200102c0)
Thread 0 encountered an error: Interrupted system call (0x200102c0)

Are the errors in the last test case to be expected under the 20130612 
snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?


Ciao,
Angelo.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-06-16 17:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <b4m61xqu654.fsf@jpl.org>
     [not found] ` <51B5DA82.4010703@alice.it>
     [not found]   ` <3EC77598-24B8-42DD-8983-5069E64AAB60@swipnet.se>
     [not found]     ` <51B62175.10307@alice.it>
     [not found]       ` <06F80BBC-D7CD-4E6C-97AD-EB8E476E2FC0@swipnet.se>
     [not found]         ` <83sj0olh38.fsf@gnu.org>
     [not found]           ` <51B7717D.6060702@cs.ucla.edu>
     [not found]             ` <51B77A00.2060908@cornell.edu>
     [not found]               ` <83mwqwl903.fsf@gnu.org>
     [not found]                 ` <51B78346.3050600@cornell.edu>
     [not found]                   ` <FA9D25B7-3D1F-40CC-AA6E-5347E8112CA4@swipnet.se>
     [not found]                     ` <E143AC75-8C2B-4A59-81F6-571B9D4EEF13@swipnet.se>
     [not found]                       ` <2E06A322-530C-4AA2-9282-6D2E48B1D194@swipnet.se>
     [not found]                         ` <51B8BEFE.6070309@cs.ucla.edu>
     [not found]                           ` <51B8D5ED.1010407@alice.it>
     [not found]                             ` <C679A2B2-0264-4DDA-B900-5B90BE7CF1E9@swipnet.se>
     [not found]                               ` <51BA03CA.4080804@cs.ucla.edu>
     [not found]                                 ` <BEC82502-E9FD-4F8E-B91E-F680F6885FB2@swipnet.se>
2013-06-14 17:45                                   ` bug#14569: 24.3.50; bootstrap fails on Cygwin Paul Eggert
2013-06-14 18:04                                     ` Christopher Faylor
2013-06-14 20:30                                       ` Ken Brown
2013-06-15  6:01                                       ` Paul Eggert
2013-06-15  6:21                                         ` Christopher Faylor
2013-06-14 18:16                                     ` Eli Zaretskii
2013-06-15 13:54 Angelo Graziosi
2013-06-16 13:12 ` Ken Brown
2013-06-16 15:01   ` Christopher Faylor
2013-06-16 17:52     ` Ken Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).