public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* RE: ssh.exe on cygwin: Write error
@ 2013-07-16  8:26 Devin Nate
  2013-07-18  7:22 ` Devin Nate
  0 siblings, 1 reply; 3+ messages in thread
From: Devin Nate @ 2013-07-16  8:26 UTC (permalink / raw)
  To: cygwin

Dear Cygwin list;

So I've made some progress on the problem with ssh I started out trying to solve... unfortunately, it's got me in select.cc in Cygwin.

Basically, the ssh.exe program operates as this:

Ssh sets up a connection, and starts client_loop;

client_loop monitors (in the debugging case) a single channel. It checks to see if input is to be read (from stdin in this case), and checks if there's data to write from an output buffer and also if select() says the outbound connection is writable. In the case of debugging, the network connection from ssh.exe to the server is on fd 3.

If there's data to read, it reads it into a buffer.

If there's data to send in the output buffer AND select() says that fd 3 is writable, then it calls packet_write_poll, which then calls roaming_write, which does a write() on the fd.  If there's a failure to write(), then packet_write_poll sees what the error is. EAGAIN, EINTR, and EWOULDBLOCK (same as EAGAIN on Cygwin) are non-fatal. Any other error is fatal.


In debugging, what happens is that the client_loop is processing away just fine. As it happens, it's reading more data than writing on stdin. It is happily writing data on the outbound socket, using write() as called by roaming_write as called by packet_write_poll. At some point, something ?bad? occurs.

1. Select() says that the fd 3 (outbound connection) is writeable to the network.

2. Write() goes to write, but gets an error 11 (EAGAIN).

3. Many (probably 50-100) calls to select() say that the socket is not writeable, and a packet trace on the server side confirm that the flow of packets has completely stopped. I can see that peek_socket() in select.cc is returning 'peek_socket: read_ready: 0, write_ready: 0, except_ready: 0' in the strace.

4. After some time (30 seconds) select() on fd 3 returns both readable+writable. It tries to read from fd 3, but it gets an error 104 (ECONNRESET). It subsequently tries to write on the socket, and also gets an error 104 (ECONNRESET).

5. Since the write() failed, it returns that to roaming_write, which returns it to packet_write_poll. This prints the fatal error "Write failed: connection reset by peer".

6. Interestingly, the server side has not issued a tcp/ip rst. In fact, from the server perspective, it just looks like the tcp/ip connection stalled (happens right at the error 11). The server side isn't shut down till some time later.

7. Definitely, the connection does get 'backed up' so to speak - i.e. I'm pushing more data than the internet connection can handle without blocking to process data, and I would expect select() and/or write() to fail waiting for the network to clear some buffers. That said, it's almost like the socket die's or needs to reset or something after the error 11 (EAGAIN).

8. I don't see any signals or timeouts happening. Also, I've retested with Cygwin 1.7.21 with no additional success.


I'm going to keep looking, but any thoughts with the new information?

Thanks,
Devin






--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: ssh.exe on cygwin: Write error
  2013-07-16  8:26 ssh.exe on cygwin: Write error Devin Nate
@ 2013-07-18  7:22 ` Devin Nate
  2013-07-18  8:11   ` Christopher Faylor
  0 siblings, 1 reply; 3+ messages in thread
From: Devin Nate @ 2013-07-18  7:22 UTC (permalink / raw)
  To: cygwin

Final followup to close the loop on this.

Having debugged rsync, ssh, and finally Cygwin...the problem turned out to be a D-Link router doing (a bad job of) QoS processing.

Each of rsync, ssh, and Cygwin appear to have operated exactly correct, including pipe(), select(), stdin/stdout, and Windows socket handling.

Thanks,
Devin


-----Original Message-----
Sent: Tuesday, July 16, 2013 12:04 AM
Subject: RE: ssh.exe on cygwin: Write error

Dear Cygwin list;

So I've made some progress on the problem with ssh I started out trying to solve... unfortunately, it's got me in select.cc in Cygwin.

Basically, the ssh.exe program operates as this:

Ssh sets up a connection, and starts client_loop;

client_loop monitors (in the debugging case) a single channel. It checks to see if input is to be read (from stdin in this case), and checks if there's data to write from an output buffer and also if select() says the outbound connection is writable. In the case of debugging, the network connection from ssh.exe to the server is on fd 3.

If there's data to read, it reads it into a buffer.

If there's data to send in the output buffer AND select() says that fd 3 is writable, then it calls packet_write_poll, which then calls roaming_write, which does a write() on the fd.  If there's a failure to write(), then packet_write_poll sees what the error is. EAGAIN, EINTR, and EWOULDBLOCK (same as EAGAIN on Cygwin) are non-fatal. Any other error is fatal.


In debugging, what happens is that the client_loop is processing away just fine. As it happens, it's reading more data than writing on stdin. It is happily writing data on the outbound socket, using write() as called by roaming_write as called by packet_write_poll. At some point, something ?bad? occurs.

1. Select() says that the fd 3 (outbound connection) is writeable to the network.

2. Write() goes to write, but gets an error 11 (EAGAIN).

3. Many (probably 50-100) calls to select() say that the socket is not writeable, and a packet trace on the server side confirm that the flow of packets has completely stopped. I can see that peek_socket() in select.cc is returning 'peek_socket: read_ready: 0, write_ready: 0, except_ready: 0' in the strace.

4. After some time (30 seconds) select() on fd 3 returns both readable+writable. It tries to read from fd 3, but it gets an error 104 (ECONNRESET). It subsequently tries to write on the socket, and also gets an error 104 (ECONNRESET).

5. Since the write() failed, it returns that to roaming_write, which returns it to packet_write_poll. This prints the fatal error "Write failed: connection reset by peer".

6. Interestingly, the server side has not issued a tcp/ip rst. In fact, from the server perspective, it just looks like the tcp/ip connection stalled (happens right at the error 11). The server side isn't shut down till some time later.

7. Definitely, the connection does get 'backed up' so to speak - i.e. I'm pushing more data than the internet connection can handle without blocking to process data, and I would expect select() and/or write() to fail waiting for the network to clear some buffers. That said, it's almost like the socket die's or needs to reset or something after the error 11 (EAGAIN).

8. I don't see any signals or timeouts happening. Also, I've retested with Cygwin 1.7.21 with no additional success.


I'm going to keep looking, but any thoughts with the new information?

Thanks,
Devin




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ssh.exe on cygwin: Write error
  2013-07-18  7:22 ` Devin Nate
@ 2013-07-18  8:11   ` Christopher Faylor
  0 siblings, 0 replies; 3+ messages in thread
From: Christopher Faylor @ 2013-07-18  8:11 UTC (permalink / raw)
  To: cygwin

On Thu, Jul 18, 2013 at 02:26:39AM +0000, Devin Nate wrote:
>Final followup to close the loop on this.
>
>Having debugged rsync, ssh, and finally Cygwin...the problem turned out
>to be a D-Link router doing (a bad job of) QoS processing.
>
>Each of rsync, ssh, and Cygwin appear to have operated exactly correct,
>including pipe(), select(), stdin/stdout, and Windows socket handling.

That's good (and refreshing) to know.  Thanks for closing the loop.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-07-18  3:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16  8:26 ssh.exe on cygwin: Write error Devin Nate
2013-07-18  7:22 ` Devin Nate
2013-07-18  8:11   ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).