public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Erik Bray <erik.m.bray@gmail.com>
To: cygwin@cygwin.com
Subject: Re: Cygwin hanging in pselect
Date: Mon, 09 Jan 2017 11:01:00 -0000	[thread overview]
Message-ID: <CAOTD34Z_58ce-E0wCuJP67UODZLmWncuXaZUGOyNeYX_atXh6w@mail.gmail.com> (raw)
In-Reply-To: <CAOTD34ZBJSD2guV9Qjz_wvtMH+vWYi7Hgr3NdaJ81FPOBycuZA@mail.gmail.com>

On Fri, Jan 6, 2017 at 12:40 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
> Hello, and happy new-ish year,
>
> I've been working on and off over the past few months on bringing
> Python's compatibility with Cygwin up to snuff, including having all
> pertinent tests passing.  I've noticed that there are several tests
> (which I currently skip) that cause the process to hang indefinitely,
> and not respond to any signals from Cygwin (it can only be killed from
> Windows).  This is Cygwin 64-bit--I have not tested 32-bit.
>
> I finally looked into this problem and found the lockup to be in
> pselect() somewhere.  Attached I've provided the most minimal example
> I've been able to come up with so far that reproduces the problem,
> which I'll describe in a bit more detail next. I would attach a
> cygcheck output if requested, but I was also able to reproduce this on
> a recent build from source.
>
> So far as I've been able to tell, the problem only occurs with AF_UNIX
> sockets.  In the example I have a 'server' socket and a 'client'
> socket both set to non-blocking.  The client connects to the socket,
> returning errno EINPROGRESS as expected.  Then I do a pselect on the
> client socket to wait until it is ready to be read from.  The hang
> only happens when I pselect on the client socket, and not on the
> server socket.  It doesn't seem to make a difference what the timeout
> is.  One thing I have no tried is if the client and server are
> actually different processes, but the example from the Python tests
> this is reproducing is where they are both in the same process.
>
> Below is (I think) the most relevant output from strace on the test
> case.  It seems to hang somewhere in socket_cleanup, but I haven't
> investigated any further than that.

I made a little bit of progress debugging this, but now I'm stumped.
It seems the problem is this:

For each socket whose fd is passed to select() a thread_socket is
started which calls peek_socket until there are bits ready on the
socket, or until the timeout is reached.  This in turn calls
fhandler_socket::evaluate_events.

The reason it's only locking up on my "client thread" on which
connect() is called, is that evaluate_events notes that the socket is
waiting to connect, and this passes control to
fhandler_socket::af_local_connect().  af_local_connect() temporarily
sets the socket to blocking, then sends a magic string to the socket
(you can see in my strace log that this succeeds).  What's strange,
and what I don't understand, is that there are no FD_READ or FD_OOB
events recorded for the WSASendTo call from af_local_send_secret().
Then, after af_local_send_secret() it calls af_local_recv_secret().
This calls recv_internal() which in turn calls recursively into
fhandler_socket::evaluate_events where it waits for an FD_READ or
FD_OOB event that never arrives.  And since it set the socket to
blocking it just sits in an infinite loop.

Meanwhile the timer for the select() call expires and tries to shut
down the thread_socket but it can't because it never completes.

What I don't understand is why there is not an event recorded for the
WSASendTo in send_internal.  I even wrapped it with the following
debug code to wait for an FD_READ event immediately following the
WSASendTo:

      else if (get_socket_type () == SOCK_STREAM)
      {
        WSAEventSelect(get_socket (), wsock_evt, EVENT_MASK);
        res = WSASendTo (get_socket (), out_buf, out_idx, &ret, flags,
                 wsamsg->name, wsamsg->namelen, NULL, NULL);
          debug_printf("WSASendTo sent %d bytes; ret: %d", ret, res);
          while (!(res=wait_for_events (FD_READ | FD_OOB, 0))) {
              debug_printf("Waiting for socket to be readable");
          }
      }



But the strace at this point just outputs:
   62  108286 [socksel] poll_test 24152
fhandler_socket::af_local_connect: af_local_connect called,
no_getpeereid=0
  156  108442 [socksel] poll_test 24152
fhandler_socket::send_internal: WSASendTo sent 16 bytes; ret: 0

It never returns from send_internal.  I don't have deep knowledge of
WinSock, but from what I've read ISTM WSASendTo should have triggered
an FD_READ event on the socket, and it doesn't for some reason.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

      reply	other threads:[~2017-01-09 11:01 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-06 11:40 Erik Bray
2017-01-09 11:01 ` Erik Bray [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOTD34Z_58ce-E0wCuJP67UODZLmWncuXaZUGOyNeYX_atXh6w@mail.gmail.com \
    --to=erik.m.bray@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).