* Cygwin hanging in pselect
@ 2017-01-06 11:40 Erik Bray
2017-01-09 11:01 ` Erik Bray
0 siblings, 1 reply; 2+ messages in thread
From: Erik Bray @ 2017-01-06 11:40 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 10081 bytes --]
Hello, and happy new-ish year,
I've been working on and off over the past few months on bringing
Python's compatibility with Cygwin up to snuff, including having all
pertinent tests passing. I've noticed that there are several tests
(which I currently skip) that cause the process to hang indefinitely,
and not respond to any signals from Cygwin (it can only be killed from
Windows). This is Cygwin 64-bit--I have not tested 32-bit.
I finally looked into this problem and found the lockup to be in
pselect() somewhere. Attached I've provided the most minimal example
I've been able to come up with so far that reproduces the problem,
which I'll describe in a bit more detail next. I would attach a
cygcheck output if requested, but I was also able to reproduce this on
a recent build from source.
So far as I've been able to tell, the problem only occurs with AF_UNIX
sockets. In the example I have a 'server' socket and a 'client'
socket both set to non-blocking. The client connects to the socket,
returning errno EINPROGRESS as expected. Then I do a pselect on the
client socket to wait until it is ready to be read from. The hang
only happens when I pselect on the client socket, and not on the
server socket. It doesn't seem to make a difference what the timeout
is. One thing I have no tried is if the client and server are
actually different processes, but the example from the Python tests
this is reproducing is where they are both in the same process.
Below is (I think) the most relevant output from strace on the test
case. It seems to hang somewhere in socket_cleanup, but I haven't
investigated any further than that.
Thanks,
Erik
261 14732 [main] poll_test 79200 cygwin_socket: socket (1, 1 (flags 0x0), 0)
--- Process 79200 loaded C:\Windows\System32\ws2_32.dll at 00007FFF8D5D0000
985 15717 [main] poll_test 79200 wsock_init: res 0
21 15738 [main] poll_test 79200 wsock_init: wVersion 514
15 15753 [main] poll_test 79200 wsock_init: wHighVersion 514
12 15765 [main] poll_test 79200 wsock_init: szDescription WinSock 2.0
13 15778 [main] poll_test 79200 wsock_init: szSystemStatus Running
17 15795 [main] poll_test 79200 wsock_init: iMaxSockets 0
16 15811 [main] poll_test 79200 wsock_init: iMaxUdpDg 0
--- Process 79200 loaded C:\Windows\System32\mswsock.dll at 00007FFF89540000
557 16368 [main] poll_test 79200 build_fh_pc: fh 0x18030BE70, dev 001E0079
87 16455 [main] poll_test 79200 fhandler_base::set_flags: flags
0x10002, supplied_bin 0x0
18 16473 [main] poll_test 79200 fhandler_base::set_flags:
O_TEXT/O_BINARY set in flags 0x10000
15 16488 [main] poll_test 79200 fhandler_base::set_flags:
filemode set to binary
15 16503 [main] poll_test 79200 fdsock: fd 3, name '', soc 0x180
21 16524 [main] poll_test 79200 cygwin_socket: 3 = socket(1, 1
(flags 0x0), 0)
15 16539 [main] poll_test 79200 fcntl64: fcntl(3, 3, ...)
17 16556 [main] poll_test 79200 fhandler_base::fcntl: GETFL: 0x10002
15 16571 [main] poll_test 79200 fcntl64: 65538 = fcntl(3, 3, 0x0)
16 16587 [main] poll_test 79200 fcntl64: fcntl(3, 4, ...)
22 16609 [main] poll_test 79200 fhandler_socket::ioctl: socket is
now nonblocking
22 16631 [main] poll_test 79200 fhandler_socket::ioctl: 0 =
ioctl_socket(8004667E, 0xFFFFC9FC)
17 16648 [main] poll_test 79200 fhandler_base::set_flags: flags
0x14002, supplied_bin 0x0
18 16666 [main] poll_test 79200 fhandler_base::set_flags:
O_TEXT/O_BINARY set in flags 0x10000
15 16681 [main] poll_test 79200 fhandler_base::set_flags:
filemode set to binary
15 16696 [main] poll_test 79200 fcntl64: 0 = fcntl(3, 4, 0x14002)
21 16717 [main] poll_test 79200 normalize_posix_path: src @test.sock
... snip path checking stuff...
19 17118 [main] poll_test 79200 path_conv::check:
this->path(C:\Users\Erik M. Bray\src\python\cpython\@test.sock),
has_acls(1)
70 17188 [main] poll_test 79200 fhandler_socket::bind: AF_LOCAL:
socket bound to port 55085
298 17486 [main] poll_test 79200 set_posix_access: ACL-Size: 100
37 17523 [main] poll_test 79200 set_posix_access: Created SD-Size: 176
--- Process 79200 loaded C:\Windows\System32\cryptbase.dll at 00007FFF898B0000
--- Process 79200 loaded C:\Windows\System32\bcryptprimitives.dll at
00007FFF8AE30000
3492 21015 [main] poll_test 79200 cygwin_bind: 0 = bind(3, 0xFFFFCB10, 110)
112 21127 [main] poll_test 79200 getpid: 79200 = getpid()
27 21154 [main] poll_test 79200 cygwin_listen: 0 = listen(3, 5)
21 21175 [main] poll_test 79200 cygwin_socket: socket (1, 1 (flags 0x0), 0)
68 21243 [main] poll_test 79200 build_fh_pc: fh 0x18030C310, dev 001E0079
44 21287 [main] poll_test 79200 fhandler_base::set_flags: flags
0x10002, supplied_bin 0x0
15 21302 [main] poll_test 79200 fhandler_base::set_flags:
O_TEXT/O_BINARY set in flags 0x10000
13 21315 [main] poll_test 79200 fhandler_base::set_flags:
filemode set to binary
13 21328 [main] poll_test 79200 fdsock: fd 4, name '', soc 0x188
20 21348 [main] poll_test 79200 cygwin_socket: 4 = socket(1, 1
(flags 0x0), 0)
16 21364 [main] poll_test 79200 fcntl64: fcntl(4, 3, ...)
15 21379 [main] poll_test 79200 fhandler_base::fcntl: GETFL: 0x10002
13 21392 [main] poll_test 79200 fcntl64: 65538 = fcntl(4, 3, 0x0)
17 21409 [main] poll_test 79200 fcntl64: fcntl(4, 4, ...)
14 21423 [main] poll_test 79200 fhandler_socket::ioctl: socket is
now nonblocking
14 21437 [main] poll_test 79200 fhandler_socket::ioctl: 0 =
ioctl_socket(8004667E, 0xFFFFC9FC)
13 21450 [main] poll_test 79200 fhandler_base::set_flags: flags
0x14002, supplied_bin 0x0
13 21463 [main] poll_test 79200 fhandler_base::set_flags:
O_TEXT/O_BINARY set in flags 0x10000
13 21476 [main] poll_test 79200 fhandler_base::set_flags:
filemode set to binary
12 21488 [main] poll_test 79200 fcntl64: 0 = fcntl(4, 4, 0x14002)
20 21508 [main] poll_test 79200 normalize_posix_path: src @test.sock
...
76 21922 [main] poll_test 79200 getpid: 79200 = getpid()
--- Process 79200 thread 18528 created
350 22272 [main] poll_test 79200 __set_errno: void
__set_winsock_errno(const char*, int):224 setting errno 119
26 22298 [main] poll_test 79200 __set_winsock_errno: connect:1232
- winsock error 10036 -> errno 119
17 22315 [main] poll_test 79200 cygwin_connect: -1 = connect(4,
0xFFFFCB10, 110), errno 119
100 22415 [main] poll_test 79200 time: 1483702462 = time(0x0)
731 23146 [main] poll_test 79200
pwdgrp::fetch_account_from_windows: line:
<Administrators:S-1-5-32-544:544:>
108 23254 [main] poll_test 79200 stat64: entering
32 23286 [main] poll_test 79200 normalize_posix_path: src /dev
30 23316 [main] poll_test 79200 normalize_posix_path: /dev =
normalize_posix_path (/dev)
26 23342 [main] poll_test 79200 mount_info::conv_to_win32_path:
conv_to_win32_path (/dev)
27 23369 [main] poll_test 79200 set_flags: flags: binary (0x2)
23 23392 [main] poll_test 79200 mount_info::conv_to_win32_path:
src_path /dev, dst C:\cygwin64\dev, flags 0x3000A, rc 0
54 23446 [main] poll_test 79200 symlink_info::check: 0x0 =
NtCreateFile (\??\C:\cygwin64\dev)
37 23483 [main] poll_test 79200 symlink_info::check: not a symlink
29 23512 [main] poll_test 79200 symlink_info::check: 0 =
symlink.check(C:\cygwin64\dev, 0xFFFFB250) (0x43000A)
39 23551 [main] poll_test 79200 build_fh_pc: fh 0x18030C700, dev 000000C1
30 23581 [main] poll_test 79200 stat_worker:
(\??\C:\cygwin64\dev, 0x1802E2A20, 0x18030C700), file_attributes 16
194 23775 [main] poll_test 79200 fhandler_base::fstat_helper: 0 =
fstat (\??\C:\cygwin64\dev, 0x1802E2A20) st_size=0, st_mode=040755,
st_ino=562949953536516st_atim=57696141.1B43B4B0
st_ctim=57696141.1B43B4B0 st_mtim=57696141.1B43B4B0
st_birthtim=5769612C.1F3F08BC
44 23819 [main] poll_test 79200 stat_worker: 0 =
(\??\C:\cygwin64\dev,0x1802E2A20)
38 23857 [main] poll_test 79200 fstat64: 0 = fstat(1, 0xFFFFC580)
48 23905 [main] poll_test 79200 isatty: 1 = isatty(1)
154 24059 [main] poll_test 79200 fhandler_pty_slave::write: pty9,
write(0x6000426B0, 40)
51 24110 [main] poll_test 79200
fhandler_pty_common::process_opost_output: (1901): pty output_mutex
(0x140): waiting -1 ms
47 24157 [main] poll_test 79200
fhandler_pty_common::process_opost_output: (1901): pty output_mutex:
acquired
23 24180 [main] poll_test 79200
fhandler_pty_common::process_opost_output: (1940): pty
output_mutex(0x140) released
Ret from client connect: -1; errno: 119
23 24203 [main] poll_test 79200 write: 40 = write(1, 0x6000426B0, 40)
99 24302 [main] poll_test 79200 pselect: pselect (5, 0xFFFFCB90,
0x0, 0x0, 0xFFFFCB80, 0x0)
20 24322 [main] poll_test 79200 pselect: to->tv_sec 1,
to->tv_nsec 0, us 1000000
41 24363 [main] poll_test 79200 dtable::select_read: fd 4
33 24396 [main] poll_test 79200 select: sel.always_ready 0
97 24493 [main] poll_test 79200 start_thread_socket: stuff_start 0xFFFFC8E8
--- Process 79200 thread 46528 created
172 24665 [socksel] poll_test 79200 cygthread::stub: thread
'socksel', id 0xB5C0, stack_ptr 0x12ECCD0
24 24689 [socksel] poll_test 79200 thread_socket: stuff_start
0xFFFFC8E8, timeout 4294967295
23 24712 [main] poll_test 79200 select_stuff::wait: m 4, us
1000000, wmfo_timeout -1
31 24743 [socksel] poll_test 79200
fhandler_socket::af_local_connect: af_local_connect called,
no_getpeereid=0
106 24849 [socksel] poll_test 79200
fhandler_socket::af_local_send_secret: Sending af_local secret
succeeded
999984 1024833 [main] poll_test 79200 select_stuff::wait: wait_ret 3,
m = 4. verifying
65 1024898 [main] poll_test 79200 select_stuff::wait: timed out
38 1024936 [main] poll_test 79200 select_stuff::wait: returning 1
31 1024967 [main] poll_test 79200 select: sel.wait returns 1
18 1024985 [main] poll_test 79200 select_stuff::cleanup: calling
cleanup routines
16 1025001 [main] poll_test 79200 socket_cleanup: si 0x6000526C0
si->thread 0x1801FE758
[-- Attachment #2: poll_test.c --]
[-- Type: text/x-csrc, Size: 1176 bytes --]
#include <arpa/inet.h>
#include <sys/socket.h>
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/time.h>
#include <sys/un.h>
#include <errno.h>
#define SET_NONBLOCKING(sock) fcntl(sock, F_SETFL, fcntl(sock, F_GETFL, 0) | O_NONBLOCK);
int main(void) {
fd_set rfds;
struct timespec tv;
int sock_server, sock_client;
int retval;
struct sockaddr_un addr; \
memset(&addr, 0, sizeof(addr)); \
addr.sun_family = AF_UNIX; \
strcpy(addr.sun_path, "@test.sock");
sock_server = socket(AF_UNIX, SOCK_STREAM, 0);
SET_NONBLOCKING(sock_server);
if (bind(sock_server, (struct sockaddr*)&addr, sizeof(addr))) {
printf("binding server socket failed");
return 1;
}
listen(sock_server, 5);
sock_client = socket(AF_UNIX, SOCK_STREAM, 0);
SET_NONBLOCKING(sock_client);
retval = connect(sock_client, (struct sockaddr*)&addr, sizeof(addr));
printf("Ret from client connect: %d; errno: %d\n", retval, errno);
FD_ZERO(&rfds);
FD_SET(sock_client, &rfds);
tv.tv_sec = 1;
tv.tv_nsec = 0;
retval = pselect(sock_client + 1, &rfds, NULL, NULL, &tv, NULL);
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 219 bytes --]
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Cygwin hanging in pselect
2017-01-06 11:40 Cygwin hanging in pselect Erik Bray
@ 2017-01-09 11:01 ` Erik Bray
0 siblings, 0 replies; 2+ messages in thread
From: Erik Bray @ 2017-01-09 11:01 UTC (permalink / raw)
To: cygwin
On Fri, Jan 6, 2017 at 12:40 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
> Hello, and happy new-ish year,
>
> I've been working on and off over the past few months on bringing
> Python's compatibility with Cygwin up to snuff, including having all
> pertinent tests passing. I've noticed that there are several tests
> (which I currently skip) that cause the process to hang indefinitely,
> and not respond to any signals from Cygwin (it can only be killed from
> Windows). This is Cygwin 64-bit--I have not tested 32-bit.
>
> I finally looked into this problem and found the lockup to be in
> pselect() somewhere. Attached I've provided the most minimal example
> I've been able to come up with so far that reproduces the problem,
> which I'll describe in a bit more detail next. I would attach a
> cygcheck output if requested, but I was also able to reproduce this on
> a recent build from source.
>
> So far as I've been able to tell, the problem only occurs with AF_UNIX
> sockets. In the example I have a 'server' socket and a 'client'
> socket both set to non-blocking. The client connects to the socket,
> returning errno EINPROGRESS as expected. Then I do a pselect on the
> client socket to wait until it is ready to be read from. The hang
> only happens when I pselect on the client socket, and not on the
> server socket. It doesn't seem to make a difference what the timeout
> is. One thing I have no tried is if the client and server are
> actually different processes, but the example from the Python tests
> this is reproducing is where they are both in the same process.
>
> Below is (I think) the most relevant output from strace on the test
> case. It seems to hang somewhere in socket_cleanup, but I haven't
> investigated any further than that.
I made a little bit of progress debugging this, but now I'm stumped.
It seems the problem is this:
For each socket whose fd is passed to select() a thread_socket is
started which calls peek_socket until there are bits ready on the
socket, or until the timeout is reached. This in turn calls
fhandler_socket::evaluate_events.
The reason it's only locking up on my "client thread" on which
connect() is called, is that evaluate_events notes that the socket is
waiting to connect, and this passes control to
fhandler_socket::af_local_connect(). af_local_connect() temporarily
sets the socket to blocking, then sends a magic string to the socket
(you can see in my strace log that this succeeds). What's strange,
and what I don't understand, is that there are no FD_READ or FD_OOB
events recorded for the WSASendTo call from af_local_send_secret().
Then, after af_local_send_secret() it calls af_local_recv_secret().
This calls recv_internal() which in turn calls recursively into
fhandler_socket::evaluate_events where it waits for an FD_READ or
FD_OOB event that never arrives. And since it set the socket to
blocking it just sits in an infinite loop.
Meanwhile the timer for the select() call expires and tries to shut
down the thread_socket but it can't because it never completes.
What I don't understand is why there is not an event recorded for the
WSASendTo in send_internal. I even wrapped it with the following
debug code to wait for an FD_READ event immediately following the
WSASendTo:
else if (get_socket_type () == SOCK_STREAM)
{
WSAEventSelect(get_socket (), wsock_evt, EVENT_MASK);
res = WSASendTo (get_socket (), out_buf, out_idx, &ret, flags,
wsamsg->name, wsamsg->namelen, NULL, NULL);
debug_printf("WSASendTo sent %d bytes; ret: %d", ret, res);
while (!(res=wait_for_events (FD_READ | FD_OOB, 0))) {
debug_printf("Waiting for socket to be readable");
}
}
But the strace at this point just outputs:
62 108286 [socksel] poll_test 24152
fhandler_socket::af_local_connect: af_local_connect called,
no_getpeereid=0
156 108442 [socksel] poll_test 24152
fhandler_socket::send_internal: WSASendTo sent 16 bytes; ret: 0
It never returns from send_internal. I don't have deep knowledge of
WinSock, but from what I've read ISTM WSASendTo should have triggered
an FD_READ event on the socket, and it doesn't for some reason.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-01-09 11:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-06 11:40 Cygwin hanging in pselect Erik Bray
2017-01-09 11:01 ` Erik Bray
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).