public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* corruption in socket layer?
@ 1997-10-06 14:47 Tim Newsham
  0 siblings, 0 replies; 6+ messages in thread
From: Tim Newsham @ 1997-10-06 14:47 UTC (permalink / raw)
  To: gnu-win32

Hi,

    I occasionally observe some very strange behaviour in
the socket code.  It seems to me that there is corruption
going on somewhere.  I'm not sure if its cygwin's problem
or winsock's problem.

What I'm seeing is the tcp/ip stack behave improperly at
times.  For example sometimes a close() happens and not
FIN is sent out.  Instead the connection is silently placed
into the closed state.  On the next incoming packet from
the remote, the local TCP stack sends out an RST.

It also appears that I sometimes get an EOF indication
(read returns zero) when the remote side has not closed
the connection, and I have not explicitely asked for
the connection to be closed.

What is strange about both of these is that I cannot
reproduce it in simple test cases, and the problems
happen intermittently.  I have some (large) programs that
exhibit this behavior reliably, but it seems to depend
on a number of factors:  The program iterates over many
connections and fails constantly in one iteration with
one version of cygwin.dll, and constantly in a later
iteration with a different version of the .dll.  It
also seems to fail much later when debugging (STRACE=1,out)
is turned on.  This seems to indicate some sort of
race condition, or possibly a corruption bug that is
trashing some internal state.

I see this behaviour in all the cygwin.dll's I have tried
it on, and am guessing that this problem has existed in
all versions of the dll (possibly a winsock bug).

Has anyone observed this behavior and looked into it?
Is it possible for my program to be trashing the cygwin
internal state, or would corruption of winsock state
have to be in the winsock dll itself?

                                         Tim N.

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re:  corruption in socket layer?]
@ 1997-10-10 15:04 Jim Thompson
  0 siblings, 0 replies; 6+ messages in thread
From: Jim Thompson @ 1997-10-10 15:04 UTC (permalink / raw)
  To: gnu-win32; +Cc: DKhosla, colin, jes, pacoid, pj

Colin (and Microsoft) should read RFC793:

  CLOSE is an operation meaning "I have no more data to send."  The
  notion of closing a full-duplex connection is subject to ambiguous
  interpretation, of course, since it may not be obvious how to treat
  the receiving side of the connection.  We have chosen to treat CLOSE
  in a simplex fashion.  The user who CLOSEs may continue to RECEIVE
  until he is told that the other side has CLOSED also.  Thus, a program
  could initiate several SENDs followed by a CLOSE, and then continue to
  RECEIVE until signaled that a RECEIVE failed because the other side
  has CLOSED.  We assume that the TCP will signal a user, even if no
  RECEIVEs are outstanding, that the other side has closed, so the user
  can terminate his side gracefully.  A TCP will reliably deliver all
  buffers SENT before the connection was CLOSED so a user who expects no
  data in return need only wait to hear the connection was CLOSED
  successfully to know that all his data was received at the destination
  TCP.  Users must keep reading connections they close for sending until
  the TCP says no more data.

The second to last sentence makes the behavior of a conformant TCP clear.

Jim
-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: corruption in socket layer?
@ 1997-10-09 15:19 Colin Peters
  0 siblings, 0 replies; 6+ messages in thread
From: Colin Peters @ 1997-10-09 15:19 UTC (permalink / raw)
  To: 'Khosla, Deepak'; +Cc: 'GNU-Win32'

Khosla, Deepak[SMTP:DKhosla@Compaq.com] wrote:
>We ran into a similar (yet different) problem - with WINSOCK you 
>cannot guarantee that on a close all the data is flushed. You better 
>do some application level handshaking. We saw that even if we did a 
>close with a timeout period, the lower level never told you whether 
>the timeout expired or whether the data was actually sent. It appears 
>that the state goes to FIN_WAIT even though a FIN has not been sent 
>out and then a close is successful!. The condition we saw this in was 
>if the remote had set the window size to zero while at the same time 
>we had only enough data left that would fit in the internal TCP 
>buffers. We would do a close, it would return successful and we would 
>go on trying to establish a new connection to the same remote and it 
>would send a reset back since from its point of view, the previous 
>session was alive.
>
>The bug was reported to MS a few months back - no response other than 
>that's how Winsock works :-)

I don't want to start any big arguments, but, if you are using Winsock,
and according to how I read the documentation I have available to me,
that *is* how Winsock works... which is not to say that's good but...
Even if you use setsockopt to turn off SO_DONTLINGER (which is on by
default) and set a SO_LINGER timeout there is no guarantee that the
data waiting in the TCP buffer has been sent when closesocket returns.
I would expect closesocket to return WSAETIMEDOUT in such a case, but
the documentation doesn't say that it should. Perhaps a newer version
of the spec says that it will and in order to force a close you have
to set SO_LINGER l_linger option to zero, but that type of thing
isn't mentioned anywhere in the spec I have. It just says that
closesocket waits until the timeout is up (if any) and then closes the
connection, even if data is waiting (and obviously one shouldn't send
a FIN if there is TCP data waiting to be sent... well, I guess that's
obvious). If the WSAETIMEOUT (or some other) error was used, then at
least we could be sure that the data got through if we didn't get a
timeout. The man pages on the Sun here aren't clear on this point
either (they don't indicate that a close will return ETIMEDOUT in such
a case, or ever, and the setsockopt page does not discuss what happens
exactly when the timeout expires for a SO_LINGER set socket when it is
closed).

From Winsock's (or sockets') point of view when the application calls
close it is saying (I think) "I am finished with this connection
completely, free up the resources associated with it." You can't do
anything with a socket after closesocket returns, but it would still
be worth it to return a timeout error, so that applications could
notify the user of improperly terminated connections. However, the
specs don't seem to say anything about it. The way it is I suppose
it is correct to say that applications must do some final handshaking
to ensure that data transfer is complete, but I don't think it is,
directly, the implementation's fault.

Colin.

-- Colin Peters - Saga Univ. Dept. of Information Science
-- colin@bird.fu.is.saga-u.ac.jp - finger for PGP public key
-- http://www.fu.is.saga-u.ac.jp/~colin/index.html
-- http://www.geocities.com/Tokyo/Towers/6162/

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: corruption in socket layer?
@ 1997-10-07 12:32 Khosla, Deepak
  0 siblings, 0 replies; 6+ messages in thread
From: Khosla, Deepak @ 1997-10-07 12:32 UTC (permalink / raw)
  To: 'Sergey Okhapkin', 'gnu-win32@cygnus.com',
	'Tim Newsham'

We ran into a similar (yet different) problem - with WINSOCK you 
cannot guarantee that on a close all the data is flushed. You better 
do some application level handshaking. We saw that even if we did a 
close with a timeout period, the lower level never told you whether 
the timeout expired or whether the data was actually sent. It appears 
that the state goes to FIN_WAIT even though a FIN has not been sent 
out and then a close is successful!. The condition we saw this in was 
if the remote had set the window size to zero while at the same time 
we had only enough data left that would fit in the internal TCP 
buffers. We would do a close, it would return successful and we would 
go on trying to establish a new connection to the same remote and it 
would send a reset back since from its point of view, the previous 
session was alive.

The bug was reported to MS a few months back - no response other than 
that's how Winsock works :-)

Regards

Deepak Khosla
281-514-9234
DKhosla@compaq.com

-----Original Message-----
From:	Sergey Okhapkin [SMTP:sos@prospect.com.ru]
Sent:	Tuesday, October 07, 1997 3:06 AM
To:	gnu-win32@cygnus.com; 'Tim Newsham'
Subject:	RE: corruption in socket layer?

Tim Newsham wrote:
> What I'm seeing is the tcp/ip stack behave improperly at
> times.  For example sometimes a close() happens and not
> FIN is sent out.  Instead the connection is silently placed
> into the closed state.  On the next incoming packet from
> the remote, the local TCP stack sends out an RST.
>
> I see this behaviour in all the cygwin.dll's I have tried
> it on, and am guessing that this problem has existed in
> all versions of the dll (possibly a winsock bug).
>

Would you like to write a win32 native test suite for a problem not 
using cygwin.dll?

--
Sergey Okhapkin, http://www.lexa.ru/sos
Moscow, Russia
Looking for a job.


-
For help on using this list (especially unsubscribing), send a message 
to
"gnu-win32-request@cygnus.com" with one line of text: "help".

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corruption in socket layer?
  1997-10-07  0:07 Sergey Okhapkin
@ 1997-10-07 12:32 ` Tim Newsham
  0 siblings, 0 replies; 6+ messages in thread
From: Tim Newsham @ 1997-10-07 12:32 UTC (permalink / raw)
  To: Sergey Okhapkin; +Cc: gnu-win32, newsham

> Tim Newsham wrote:
> > What I'm seeing is the tcp/ip stack behave improperly at
> > times.  For example sometimes a close() happens and not
> > FIN is sent out.  Instead the connection is silently placed
> > into the closed state.  On the next incoming packet from
> > the remote, the local TCP stack sends out an RST.
> > 
> > I see this behaviour in all the cygwin.dll's I have tried
> > it on, and am guessing that this problem has existed in
> > all versions of the dll (possibly a winsock bug).
> > 
> 
> Would you like to write a win32 native test suite for a problem not using cygwin.dll?

If I knew how to reproduce it I would, but I haven't yet been
able to reproduce it in any small test cases.  

> Sergey Okhapkin, http://www.lexa.ru/sos

-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: corruption in socket layer?
@ 1997-10-07  0:07 Sergey Okhapkin
  1997-10-07 12:32 ` Tim Newsham
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Okhapkin @ 1997-10-07  0:07 UTC (permalink / raw)
  To: gnu-win32, 'Tim Newsham'

Tim Newsham wrote:
> What I'm seeing is the tcp/ip stack behave improperly at
> times.  For example sometimes a close() happens and not
> FIN is sent out.  Instead the connection is silently placed
> into the closed state.  On the next incoming packet from
> the remote, the local TCP stack sends out an RST.
> 
> I see this behaviour in all the cygwin.dll's I have tried
> it on, and am guessing that this problem has existed in
> all versions of the dll (possibly a winsock bug).
> 

Would you like to write a win32 native test suite for a problem not using cygwin.dll?

-- 
Sergey Okhapkin, http://www.lexa.ru/sos
Moscow, Russia
Looking for a job.


-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~1997-10-10 15:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-10-06 14:47 corruption in socket layer? Tim Newsham
1997-10-07  0:07 Sergey Okhapkin
1997-10-07 12:32 ` Tim Newsham
1997-10-07 12:32 Khosla, Deepak
1997-10-09 15:19 Colin Peters
1997-10-10 15:04 corruption in socket layer?] Jim Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).