VM and non-blocking writes

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* VM and non-blocking writes
@ 2007-12-13 17:31 Wayne Christopher
  2007-12-13 17:45 ` Dave Korn
  2007-12-13 17:59 ` Corinna Vinschen
  0 siblings, 2 replies; 20+ messages in thread
From: Wayne Christopher @ 2007-12-13 17:31 UTC (permalink / raw)
  To: cygwin

I have a server application that runs on XP under the latest cygwin, 
that opens up a socket connection to a client on another system, makes 
that socket non-blocking using fcntl(.... O_NDELAY), and then feeds the 
client a large file (100's of MBs) by doing the following:

1. call write() with the entire size of the data not yet written

2. the return value of write is the number of bytes actually written 
(should be limited by the socket buffer size - it is on linux)

3. select() for writable status on the socket (and do other things in 
the mean time)

4. when the socket becomes writable, goto 1

What I see is that no matter how large the size is that I give to 
write(), the return value is always the full size.  Also, I see the 
virtual memory used by my process go way up - in fact it goes up by much 
more than the amount of data I've written.

I tried putting in a limit of 10KB in the size given to the write() 
call.  I still see the VM size grow - more slowly this time, but it 
eventually reaches 1.5GB and then I'm out of memory.

Has anybody seen this behavior?  Should I not be using O_NDELAY?  Any 
other workarounds?

I don't have a simple example program but I can make one if that will help.

Thanks,

    Wayne

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: VM and non-blocking writes
  2007-12-13 17:31 VM and non-blocking writes Wayne Christopher
@ 2007-12-13 17:45 ` Dave Korn
  2007-12-13 17:59 ` Corinna Vinschen
  1 sibling, 0 replies; 20+ messages in thread
From: Dave Korn @ 2007-12-13 17:45 UTC (permalink / raw)
  To: cygwin

On 13 December 2007 17:35, Wayne Christopher wrote:

> What I see is that no matter how large the size is that I give to
> write(), the return value is always the full size.  Also, I see the
> virtual memory used by my process go way up - in fact it goes up by much
> more than the amount of data I've written.
> 
> I tried putting in a limit of 10KB in the size given to the write()
> call.  I still see the VM size grow - more slowly this time, but it
> eventually reaches 1.5GB and then I'm out of memory.
> 
> Has anybody seen this behavior?  

  Dunno about anyone else, but I haven't.

> Should I not be using O_NDELAY?  

  It's supposed to work; it's always possible you've hit on a bug.

> Any other workarounds?
> 
> I don't have a simple example program but I can make one if that will help.

  Yes, it would be a good idea; let's see if anyone can reproduce the problem.

  What firewall/antispyware/other net-related security software do you have
installed?


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-13 17:31 VM and non-blocking writes Wayne Christopher
  2007-12-13 17:45 ` Dave Korn
@ 2007-12-13 17:59 ` Corinna Vinschen
  2007-12-13 19:16   ` Wayne Christopher
  1 sibling, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-13 17:59 UTC (permalink / raw)
  To: cygwin

On Dec 13 09:34, Wayne Christopher wrote:
> I have a server application that runs on XP under the latest cygwin, that 
> opens up a socket connection to a client on another system, makes that 
> socket non-blocking using fcntl(.... O_NDELAY), and then feeds the client a 
> large file (100's of MBs) by doing the following:
>
> 1. call write() with the entire size of the data not yet written
>
> 2. the return value of write is the number of bytes actually written 
> (should be limited by the socket buffer size - it is on linux)
>
> 3. select() for writable status on the socket (and do other things in the 
> mean time)
>
> 4. when the socket becomes writable, goto 1
>
> What I see is that no matter how large the size is that I give to write(), 
> the return value is always the full size.  Also, I see the virtual memory 
> used by my process go way up - in fact it goes up by much more than the 
> amount of data I've written.
>
> I tried putting in a limit of 10KB in the size given to the write() call.  
> I still see the VM size grow - more slowly this time, but it eventually 
> reaches 1.5GB and then I'm out of memory.
>
> Has anybody seen this behavior?  Should I not be using O_NDELAY?  Any other 
> workarounds?

I never saw this behaviour.  Nonblocking sockets are no problem, usually.
ssh is using them, too.  A quick scan through the call chain (write ->
writev -> sendmsg -> WSASendTo) doesn't show up any memory allocation
which wouldn't be free'd again.  Practically everything is done on the
stack.

The return code of write is an error code or the number of bytes
written as returned by the WSASendTo function.  If it really behaves as
you describe, there would be nothing Cygwin could do about that.
However, I'd expect that WSASendTo frequently returns SOCKET_ERROR with
the error code set to WSAEWOULDBLOCK, which translates to a return code
-1 from write with errno set to EAGAIN.

Are you absolutely sure you're not wasting the memory yourself, somehow?
Or, is it possible that there's a strange interaction with some piece of
firewall or virus scanner?

> I don't have a simple example program but I can make one if that will help.

Yes, please.  If it's actually a problem in Cygwin, it's occuring only
in some border cases.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-13 17:59 ` Corinna Vinschen
@ 2007-12-13 19:16   ` Wayne Christopher
  2007-12-14 11:15     ` Corinna Vinschen
  0 siblings, 1 reply; 20+ messages in thread
From: Wayne Christopher @ 2007-12-13 19:16 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

Okay, here's my test program.  Compile and run with no arguments, then
connect to it from another machine - on a linux box I just did:

python
import socket
s = socket.socket()
s.connect(("name-of-windows-box", 12345))

At this point, nbcheck printed:

listening to port 12345 on host xp1 (10.1.2.40)
got connection from 10.1.2.14
trying to write 100000000
100000000 bytes written

When I hit return to exit from nbcheck, it does not actually exit until
the remote socket is closed.

The VM usage is 100M, which is all the data array that I allocated, so
it doesn't look like the write() call allocated anything in my process
space.

This behavior makes some sense to me, but it's not how I expect it to
work (based on the write(2) man page and how it works on linux).  It's
more like asynchronous write than non-blocking write.  Using O_NONBLOCK
instead of O_NDELAY doesn't change the behavior.

Thanks,

     Wayne



[-- Attachment #2: nbcheck.c --]
[-- Type: text/x-c++src, Size: 1785 bytes --]


#include <stdio.h>
#include <ctype.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>
#include <assert.h>

main()
{
    int i, fd, fd2, len;
    struct hostent *hp;
    struct protoent *pp;
    char hostname[64];
    struct sockaddr_in lAddr, rAddr;
    char* data;
    int datalen, datapos;
    
    gethostname(hostname, 64);
    pp = getprotobyname("tcp");
    hp = gethostbyname(hostname);
    
    assert(pp && hp);
    
    fd = socket(AF_INET, SOCK_STREAM, pp->p_proto);
    assert(fd >= 0);
    
    lAddr.sin_family = hp->h_addrtype;
    memcpy((char *)&lAddr.sin_addr.s_addr, (char *)hp->h_addr,
	   sizeof(lAddr.sin_addr.s_addr));
    lAddr.sin_port = htons(12345);
    
    i = bind(fd, (struct sockaddr *)&lAddr, sizeof(lAddr));
    assert(i >= 0);
    
    printf("listening to port %d host %s (%s)\n", ntohs(lAddr.sin_port),
	   hostname, inet_ntoa(lAddr.sin_addr));
    i = listen(fd, 5);
    assert(i >= 0);
    
    len = sizeof(rAddr);
    memset(&rAddr, 0, sizeof(rAddr));
    fd2 = accept(fd, (struct sockaddr *)&rAddr, &len);
    assert(fd2 >= 0);
    
    printf("got connection from %s\n", inet_ntoa(rAddr.sin_addr));
    
    i = fcntl(fd2, F_SETFL, O_NDELAY);
    assert(i >= 0);
    
    datalen = (int) 1e8;
    data = (char *) malloc(datalen);
    datapos = 0;
    
    while (datapos < datalen) {
	fd_set wfds;
	FD_ZERO(&wfds);
	FD_SET(fd2, &wfds);
	
	i = select(fd2 + 1, NULL, &wfds, NULL, NULL);
	assert(i == 1);
	
	printf("trying to write %d bytes\n", datalen - datapos);
	i = write(fd2, data + datapos, datalen - datapos);
	printf("%d bytes written\n", i);
	assert(i > 0);
	
	datapos += i;
	assert(datapos <= datalen);
    }
    printf("hit return to exit ");
    getchar();
    exit(0);
}



[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-13 19:16   ` Wayne Christopher
@ 2007-12-14 11:15     ` Corinna Vinschen
  2007-12-14 13:41       ` Corinna Vinschen
  2007-12-14 14:33       ` VM and non-blocking writes Corinna Vinschen
  0 siblings, 2 replies; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-14 11:15 UTC (permalink / raw)
  To: cygwin

On Dec 13 11:19, Wayne Christopher wrote:
> Okay, here's my test program.  Compile and run with no arguments, then
> connect to it from another machine - on a linux box I just did:
>
> python
> import socket
> s = socket.socket()
> s.connect(("name-of-windows-box", 12345))
>
> At this point, nbcheck printed:
>
> listening to port 12345 on host xp1 (10.1.2.40)
> got connection from 10.1.2.14
> trying to write 100000000
> 100000000 bytes written
>
> When I hit return to exit from nbcheck, it does not actually exit until
> the remote socket is closed.

This is due to trying to work around a problem in WinSock.  If you
want to make sure that your application has shutdown gracefully, 
call shutdown and close.  Otherwise Cygwin has to linger.  Not doing
so resulted in data loss in some scenarios.

> The VM usage is 100M, which is all the data array that I allocated, so
> it doesn't look like the write() call allocated anything in my process
> space.
>
> This behavior makes some sense to me, but it's not how I expect it to
> work (based on the write(2) man page and how it works on linux).  It's
> more like asynchronous write than non-blocking write.  Using O_NONBLOCK
> instead of O_NDELAY doesn't change the behavior.

I can reproduce this behaviour.  Stepping through the code shows that
the socket has been successfully switched to non-blocking (the WinSock
ioctlsocket function returns with success).  But the WinSock function
WSASendTo hangs for a while and returns with SOCKET_SUCCESS and the
number of bytes written is 100000000. 

Since the peer doesn't read these bytes, it appears that WSASendTo
creates a temporary buffer in kernel space and copies the full user
buffer into this temporary buffer.  When I raised the memory buffer to
512K, the WSASendTo function failed with WSAENOBUFS, "No buffer space
available."

This is really surprising.  The socket write buffer size on Windows is
usually 8K, afaik, if you don't change it with setsockopt(SO_SNDBUF).
Why it tries to buffer more than this 8K beats me.  I searched the net
for this problem but I didn't find any other report which would describe
such a weird behaviour.

However, I have to make some more tests, especially in a pure Win32
application to be sure that it's not a Cygwin problem only.

For the time being, I can only suggest to use smaller user buffer sizes
in calls to send()/write().

Thanks for the testcase,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-14 11:15     ` Corinna Vinschen
@ 2007-12-14 13:41       ` Corinna Vinschen
  2007-12-14 13:52         ` Corinna Vinschen
  2007-12-14 14:33       ` VM and non-blocking writes Corinna Vinschen
  1 sibling, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-14 13:41 UTC (permalink / raw)
  To: cygwin

On Dec 14 12:15, Corinna Vinschen wrote:
> On Dec 13 11:19, Wayne Christopher wrote:
> > Okay, here's my test program. 
> > [...]
> I can reproduce this behaviour.  Stepping through the code shows that
> the socket has been successfully switched to non-blocking (the WinSock
> ioctlsocket function returns with success).  But the WinSock function
> WSASendTo hangs for a while and returns with SOCKET_SUCCESS and the
> number of bytes written is 100000000. 
> 
> Since the peer doesn't read these bytes, it appears that WSASendTo
> creates a temporary buffer in kernel space and copies the full user
> buffer into this temporary buffer.  When I raised the memory buffer to
> 512K, the WSASendTo function failed with WSAENOBUFS, "No buffer space
> available."
> 
> This is really surprising.  The socket write buffer size on Windows is
> usually 8K, afaik, if you don't change it with setsockopt(SO_SNDBUF).
> Why it tries to buffer more than this 8K beats me.  I searched the net
> for this problem but I didn't find any other report which would describe
> such a weird behaviour.
> 
> However, I have to make some more tests, especially in a pure Win32
> application to be sure that it's not a Cygwin problem only.

I can reproduce this behaviour with a native Windows application.  It
does not depend on using WSASendTo vs. using send, and in case of
using WSASendTo it happens independently of using one big WSABUF
element or multiple smaller elements.  I can reproduce the behaviour
on Windows 2000 SP4, XP SP2 and Vista SP1 RC1, so it's not even OS
dependent.

On the other hand, as soon as I call send (or WSASendTo) multiple
times with smaller sizes (I tried with 10k), select starts to
block at one point.  But even then strange things happen.  After
some time (after 5 seconds, then after 14 seconds, then about every
60 seconds) select() just signals the socket ready for write and
the next send adds another 10K to the internal buffer.  A tcpdump
on the interface shows that no package goes over the line... which
would be a surprise anyway, given that the peer does not even once
call read().

Given that, I don't see what we can do about this misbehaviour.
I guess I'll report this as a bug and see what the reaction will
be.  Especially if there's a useful workaround.

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-14 13:41       ` Corinna Vinschen
@ 2007-12-14 13:52         ` Corinna Vinschen
  2007-12-14 14:35           ` Lev Bishop
  0 siblings, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-14 13:52 UTC (permalink / raw)
  To: cygwin

On Dec 14 14:41, Corinna Vinschen wrote:
> On the other hand, as soon as I call send (or WSASendTo) multiple
> times with smaller sizes (I tried with 10k), select starts to
> block at one point.  But even then strange things happen.  After
> some time (after 5 seconds, then after 14 seconds, then about every
> 60 seconds) select() just signals the socket ready for write and
> the next send adds another 10K to the internal buffer.  A tcpdump
> on the interface shows that no package goes over the line... which
> would be a surprise anyway, given that the peer does not even once
> call read().

Hmm, a few minutes ago select() mysteriously blocked fully after send
has written 19 blocks of 10K each....


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-14 13:52         ` Corinna Vinschen
@ 2007-12-14 14:35           ` Lev Bishop
  2007-12-14 17:25             ` Corinna Vinschen
  0 siblings, 1 reply; 20+ messages in thread
From: Lev Bishop @ 2007-12-14 14:35 UTC (permalink / raw)
  To: cygwin

On Dec 14, 2007 8:52 AM, Corinna Vinschen wrote:
> On Dec 14 14:41, Corinna Vinschen wrote:
> > On the other hand, as soon as I call send (or WSASendTo) multiple
> > times with smaller sizes (I tried with 10k), select starts to
> > block at one point.  But even then strange things happen.  After
> > some time (after 5 seconds, then after 14 seconds, then about every
> > 60 seconds) select() just signals the socket ready for write and
> > the next send adds another 10K to the internal buffer.  A tcpdump
> > on the interface shows that no package goes over the line... which
> > would be a surprise anyway, given that the peer does not even once
> > call read().
>
> Hmm, a few minutes ago select() mysteriously blocked fully after send
> has written 19 blocks of 10K each....

Good luck with figuring this stuff out. The way winsock deals with all
of this stuff is rather mysterious and quite hackish, basically
because it's all implemented in an emulation layer afd.sys and
msafd.dll which tries to give bsd socket syntax (or something sorta
close anyway) on top of the native overlapped io. The afd layer does
some mighty weird things. See, for example, my reverse engineering of
one aspect of it's send buffer management here:
http://www.cygwin.com/ml/cygwin-patches/2006-q2/msg00031.html

There's a whole bunch of tuning parameters that deal with when afd
should make a copy of an application-supplied buffer (incurring the
copy costs) or just lock the application buffer in ram (incurring VM
manipulation costs) and so on. Look at registry configuration
parameters:
DefaultReceiveWindow, DefaultSendWindow, FastCopyReceiveThreshold,
FastSendDatagramThreshold, LargeBufferSize, LargeBufferListDepth,
MaxFastTransmit, MaxFastCopyTransmit, MediumBufferSize,
MediumBufferListDepth, OverheadChargeGranularity, PriorityBoost,
SmallBufferListDepth, SmallBufferSize, TransmitIoLength,
FFPControlFlags, FFPFastForwardingCacheSize, GlobalMaxTcpWindowSize
and probably others.

You can probably do something about this particular issue by tweaking
those parameters, or making sure you make the sends fall on the right
side of some boundary defined by those parameters. But in general....
I'm not confident.

Lev

Lev

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-14 14:35           ` Lev Bishop
@ 2007-12-14 17:25             ` Corinna Vinschen
  2007-12-14 21:56               ` mmap failing Wayne Christopher
  0 siblings, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-14 17:25 UTC (permalink / raw)
  To: cygwin

On Dec 14 09:34, Lev Bishop wrote:
> http://www.cygwin.com/ml/cygwin-patches/2006-q2/msg00031.html

Gosh, I didn't even remember this discussion anymore, sorry.

> There's a whole bunch of tuning parameters that deal with when afd
> should make a copy of an application-supplied buffer [...]
> You can probably do something about this particular issue by tweaking
> those parameters, or making sure you make the sends fall on the right
> side of some boundary defined by those parameters. But in general....
> I'm not confident.

Same here.  Tweaking registry parameters is nothing the Cygwin DLL
should resort to.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* mmap failing
  2007-12-14 17:25             ` Corinna Vinschen
@ 2007-12-14 21:56               ` Wayne Christopher
  2007-12-16 11:04                 ` Corinna Vinschen
  2008-01-07 16:03                 ` Linda Walsh
  0 siblings, 2 replies; 20+ messages in thread
From: Wayne Christopher @ 2007-12-14 21:56 UTC (permalink / raw)
  To: cygwin

I have a 268MB file open for writing.  I close it and then
immediately try to mmap() it, and a get ENOMEM.  However I do have the
VM space available and can malloc() the size of the file right after the 
failure.  Also, I have mmap()'ed other similar files in the same program 
before this, but these had not just been closed.

My initial guess was that it was timing related, but if I wait for 5
seconds and try again I still get the failure.

I wasn't able to duplicate it in a small example since my app has a 
bunch of threads and is doing other stuff at the same time.

Any suggestions for solutions or workarounds?  Maybe strategic use of 
fsync() ?

Thanks,

     Wayne

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: mmap failing
  2007-12-14 21:56               ` mmap failing Wayne Christopher
@ 2007-12-16 11:04                 ` Corinna Vinschen
  2007-12-17 18:47                   ` Wayne Christopher
  2008-01-07 16:03                 ` Linda Walsh
  1 sibling, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-16 11:04 UTC (permalink / raw)
  To: cygwin

On Dec 14 13:59, Wayne Christopher wrote:
> I have a 268MB file open for writing.  I close it and then
> immediately try to mmap() it, and a get ENOMEM.  However I do have the
> VM space available and can malloc() the size of the file right after the 
> failure.  Also, I have mmap()'ed other similar files in the same program 
> before this, but these had not just been closed.
>
> My initial guess was that it was timing related, but if I wait for 5
> seconds and try again I still get the failure.
>
> I wasn't able to duplicate it in a small example since my app has a bunch 
> of threads and is doing other stuff at the same time.
>
> Any suggestions for solutions or workarounds?

Not without testcase and version information.  Is that under 1.5.24-2
or 1.5.25-x?  Could you test if the result differs between these two
versions?  If so, how?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: mmap failing
  2007-12-16 11:04                 ` Corinna Vinschen
@ 2007-12-17 18:47                   ` Wayne Christopher
  0 siblings, 0 replies; 20+ messages in thread
From: Wayne Christopher @ 2007-12-17 18:47 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

My test program is attached.  This example works but in my real program 
the same write+close+mmap sequence did not.

It appears that calling fsync before the close sometimes avoids the 
error but not always.

This is Cygwin 1.5.24(0.156/4/2).

Any thoughts?  Thanks,

    Wayne

Corinna Vinschen wrote:
> On Dec 14 13:59, Wayne Christopher wrote:
>   
>> I have a 268MB file open for writing.  I close it and then
>> immediately try to mmap() it, and a get ENOMEM.  However I do have the
>> VM space available and can malloc() the size of the file right after the 
>> failure.  Also, I have mmap()'ed other similar files in the same program 
>> before this, but these had not just been closed.
>>
>> My initial guess was that it was timing related, but if I wait for 5
>> seconds and try again I still get the failure.
>>
>> I wasn't able to duplicate it in a small example since my app has a bunch 
>> of threads and is doing other stuff at the same time.
>>
>> Any suggestions for solutions or workarounds?
>>     
>
> Not without testcase and version information.  Is that under 1.5.24-2
> or 1.5.25-x?  Could you test if the result differs between these two
> versions?  If so, how?
>
>
> Corinna
>
>   


[-- Attachment #2: mmcheck.c --]
[-- Type: text/x-c++src, Size: 795 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/unistd.h>

#define SIZE 268000000

main()
{
    char* fname = "test_file";
    char* data;
    int fd, i;
    struct stat sb;
    caddr_t base;
    
    data = malloc(SIZE);
    fd = open(fname, O_RDWR|O_CREAT, 0666);
    assert(fd >= 0);
    
    i = write(fd, data, SIZE);
    assert(i == SIZE);
    close(fd);

    i = stat(fname, &sb);
    assert(i >= 0);
    
    assert(SIZE == sb.st_size);
    
    fd = open(fname, O_RDONLY, 0);
    assert(fd >= 0);
    
    base = (caddr_t) mmap(NULL, SIZE, PROT_READ, MAP_SHARED, fd, 0);
    printf("base = %ld\n", (long) base);
    if (MAP_FAILED == base)
	perror("mmap");
    
    exit(0);
}


[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: mmap failing
  2007-12-14 21:56               ` mmap failing Wayne Christopher
  2007-12-16 11:04                 ` Corinna Vinschen
@ 2008-01-07 16:03                 ` Linda Walsh
  1 sibling, 0 replies; 20+ messages in thread
From: Linda Walsh @ 2008-01-07 16:03 UTC (permalink / raw)
  To: cygwin; +Cc: wayne

  Wayne Christopher wrote:
> I have a 268MB file open for writing.  I close it and then
> immediately try to mmap() it, and a get ENOMEM.  However I do have the
> VM space available and can malloc() the size of the file right after the 
> failure.  Also, I have mmap()'ed other similar files in the same program 
> before this, but these had not just been closed.
> 
> Any suggestions for solutions or workarounds?  Maybe strategic use of 
> fsync() ?
----
	Don't know if this problem was 'solved' or not, but I'm guessing
that mmap attempts to allocate a large chunk of virtual memory to map the
file to (e.g. 268MB in your example).  The problem has to do with finding
268MB of contiguous address space -- and I believe that was the 'rub'. Over
time, the "free memory areas"(1) get fragmented and when large blocks of
memory are desired, you may get back a failure if Win cannot find a
block large enough to fit your request -- i.e. it may not be able to find
a single memory block large enough to satisfy your request.
So you get back an error.

	I'm not sure what you can use to solve this at the cygwin-application
level, but at the windows-application level Microsoft offers a something
called a 'low fragmentation heap' (LFH), that tries to lower fragmentation
for many applications.  According to http://support.microsoft.com/kb/929136,
it can be turned off 'accidentally' in some situations which can cause
more heap-fragmentation problems.  It's unlikely most users would
encounter this problem.

	Windows has more than one memory allocation pool that it uses
(main heap, for example instead of low-frag heap) and I would guess
more than one of them could get overly fragmented.  I'd think the
system would 'cleanup' at some point.  I've only encountered the
'out-of-mem' (when I had enough, but alloc size was too large) once
that I remember.  But I when I immediately retried the program,
the error had 'gone away'...

Linda

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-14 11:15     ` Corinna Vinschen
  2007-12-14 13:41       ` Corinna Vinschen
@ 2007-12-14 14:33       ` Corinna Vinschen
  2007-12-15 17:39         ` Robert Pendell
  1 sibling, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-14 14:33 UTC (permalink / raw)
  To: cygwin

On Dec 14 12:15, Corinna Vinschen wrote:
>   I searched the net
> for this problem but I didn't find any other report which would describe
> such a weird behaviour.

Obviously I searched wrong.  There a reports about this behaviour
since at least 1998 and it has never been fixed.  These two links
might be interesting:

  http://support.microsoft.com/kb/q201213/
  http://tinyurl.com/2brokp


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-14 14:33       ` VM and non-blocking writes Corinna Vinschen
@ 2007-12-15 17:39         ` Robert Pendell
  2007-12-16 13:42           ` Corinna Vinschen
  0 siblings, 1 reply; 20+ messages in thread
From: Robert Pendell @ 2007-12-15 17:39 UTC (permalink / raw)
  To: cygwin

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Corinna Vinschen wrote:
> On Dec 14 12:15, Corinna Vinschen wrote:
>>   I searched the net
>> for this problem but I didn't find any other report which would describe
>> such a weird behaviour.
> 
> Obviously I searched wrong.  There a reports about this behaviour
> since at least 1998 and it has never been fixed.  These two links
> might be interesting:
> 
>   http://support.microsoft.com/kb/q201213/
>   http://tinyurl.com/2brokp
> 
> 
> Corinna
> 

Do you have the test case you used for the pure win32 mode?  If you do
then maybe I can try and push to get this fixed for the next service
pack release for both XP and Vista as well as Server 2008.  This will
especially be the case if it can be easily reproduced.  A source and
binary version will be useful for this.  I am in the tech beta group for
Vista SP1, XP SP3, and Server 2008 so I can at least remind them of this
bug and show them a test case.  No guarantees that it will be fixed.

- --
Robert Pendell
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHZA73s1pR2j1qW+sRAqgIAJ44S32pjI8k2EzVNQeqV29uRarLigCghExr
L4nnayAMFPFWrrCOQlUpnfs=
=iltL
-----END PGP SIGNATURE-----


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-15 17:39         ` Robert Pendell
@ 2007-12-16 13:42           ` Corinna Vinschen
  2007-12-16 14:07             ` Corinna Vinschen
  0 siblings, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-16 13:42 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 5989 bytes --]

On Dec 15 12:29, Robert Pendell wrote:
> Corinna Vinschen wrote:
> > Obviously I searched wrong.  There a reports about this behaviour
> > since at least 1998 and it has never been fixed.  These two links
> > might be interesting:
> > 
> >   http://support.microsoft.com/kb/q201213/
> >   http://tinyurl.com/2brokp
> 
> Do you have the test case you used for the pure win32 mode?

Sure, but before we start with this, a note:

  I'm contemplating the idea to workaround this problem in Cygwin (not
  for 1.5.25, but in the main trunk) by caping the number of bytes in a
  single send call, according to the patch Lev sent in
  http://www.cygwin.com/ml/cygwin-patches/2006-q2/msg00031.html.

  Lev, are you interested in reworking your patch (minus the pipe stuff)
  to match current CVS?  Is there any gain in raising SO_SNDBUF/SO_RCVBUF
  to a value > 8K, especially in the light of my experiences commented
  on in net.cc, function fdsock()?

Back to the testcase.  Source attached.  I created it so that it can be
built as Cygwin or Linux executable

  $ gcc -g -o nbcheck nbcheck.c

as well as native Windows application using mingw:

  $ gcc -g -mno-cygwin -o nbcheck-nat nbcheck.c -lws2_32

It takes the size of the user data buffer as optional argument, defaulting
to 100,000,000 bytes.

> If you do
> then maybe I can try and push to get this fixed for the next service
> pack release for both XP and Vista as well as Server 2008.  This will
> especially be the case if it can be easily reproduced.

Reproducing the issue is as easy as Wayne described.  Just start a
client application which connects but never reads, for instance by using
the python sequence Wayne used in his mail:

  $ python
  import socket
  s = socket.socket()
  s.connect(("name-of-windows-box", 12345))

If you add a second arbitrary argument, the testcase tries to write
always in 10,000 bytes chunks.  This shows how select starts to block at
one point, in my case on XP SP2 after writing 190,000 bytes.

Result on Linux:

  $ ./nbcheck 500000000
  listening to port 12345 host linux-box (10.0.0.1)
  got connection from 10.0.0.3
  accepted socket is nonblocking now
  buffer size is 100000000 bytes
  trying to write 100000000 bytes
  65536 bytes written
  trying to write 99934464 bytes
  147456 bytes written
  [HANG in select]

  $ ./nbcheck 100000000
  listening to port 12345 host linux-box (10.0.0.1)
  got connection from 10.0.0.3
  accepted socket is nonblocking now
  buffer size is 100000000 bytes
  trying to write 100000000 bytes
  65536 bytes written
  trying to write 99934464 bytes
  147456 bytes written
  [HANG in select]

  $ ./nbcheck 100000000 x
  listening to port 12345 host linux-box (10.0.0.1)
  got connection from 10.0.0.3
  accepted socket is nonblocking now
  buffer size is 100000000 bytes
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  [HANG in select]

Result on Windows;

  $ ./nbcheck-nat 500000000
  listening to port 12345 host windows-box (10.0.0.2)
  got connection from 10.0.0.3
  accepted socket is nonblocking now
  buffer size is 500000000 bytes
  trying to write 500000000 bytes
  Err: 10055
  hit return to exit 

  $ ./nbcheck-nat 100000000
  listening to port 12345 host windows-box (10.0.0.2)
  got connection from 10.0.0.3
  accepted socket is nonblocking now
  buffer size is 100000000 bytes
  trying to write 100000000 bytes
  100000000 bytes written
  hit return to exit 

  $ ./nbcheck-nat 100000000 x
  listening to port 12345 host windows-box (10.0.0.2)
  got connection from 10.0.0.3
  accepted socket is nonblocking now
  buffer size is 100000000 bytes
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  trying to write 10000 bytes
  10000 bytes written
  [WAIT in select for 5 seconds]
  trying to write 10000 bytes
  10000 bytes written
  [WAIT in select for 14 seconds]
  trying to write 10000 bytes
  10000 bytes written
  [WAIT in select for about 60 seconds]
  trying to write 10000 bytes
  10000 bytes written
  [WAIT in select for about 60 seconds]
  [a couple of times, but not always the same]
  trying to write 10000 bytes
  10000 bytes written
  [HANG in select]

The hang occured in one testruns after 160,000 bytes, in another after
190,000 bytes.  I have no idea if there's some sort of rule behind that.

> A source and > binary version will be useful for this.

Creating a binary is most easy, see above.

> I am in the tech beta group for
> Vista SP1, XP SP3, and Server 2008 so I can at least remind them of this
> bug and show them a test case.  No guarantees that it will be fixed.

Actually, given that this behaviour is known since at least 10 years, I
doubt that it will even be accepted as a bug.  But you never should give
up hope, right? :)


Thanks for your offer,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: nbcheck.c --]
[-- Type: text/x-c++src, Size: 3154 bytes --]

#include <stdio.h>
#include <assert.h>

#ifdef _WIN32

#include <windows.h>
#include <winsock2.h>

WSADATA wsadata;

#define SOCKLEN_T int

#else	// Assume Unix-like system

#include <unistd.h>
#include <stdlib.h>
#include <netdb.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define SOCKET int
#define WSADATA int
#define WSAStartup(a,b)
#define SOCKET_ERROR -1
#define SOCKLEN_T socklen_t
#define WSAGetLastError()	(errno)
#define SD_BOTH SHUT_RDWR
#define closesocket close
#define WSACleanup()

#endif

int
main(int argc, char **argv)
{
  int i;
  SOCKET fd, fd2;
  struct hostent *hp;
  struct protoent *pp;
  char hostname[64];
  struct sockaddr_in lAddr, rAddr;
  char* data;
  size_t datalen, datapos;
  
  WSAStartup (MAKEWORD(2,2), &wsadata);
  gethostname(hostname, 64);
  pp = getprotobyname("tcp");
  hp = gethostbyname(hostname);
  
  setbuf (stdout, NULL);
  assert(pp && hp);
  
  fd = socket(AF_INET, SOCK_STREAM, pp->p_proto);
  assert(fd != SOCKET_ERROR);
  
  lAddr.sin_family = hp->h_addrtype;
  memcpy(&lAddr.sin_addr.s_addr, hp->h_addr, sizeof(lAddr.sin_addr.s_addr));
  lAddr.sin_port = htons(12345);
  
  i = bind(fd, (struct sockaddr *)&lAddr, sizeof(lAddr));
  assert(i != SOCKET_ERROR);
  
  printf("listening to port %d host %s (%s)\n", ntohs(lAddr.sin_port),
	 hostname, inet_ntoa(lAddr.sin_addr));
  i = listen(fd, 5);
  assert(i != SOCKET_ERROR);
  
  i = sizeof(rAddr);
  memset(&rAddr, 0, sizeof(rAddr));
  fd2 = accept(fd, (struct sockaddr *)&rAddr, (SOCKLEN_T *) &i);
  assert(fd2 != SOCKET_ERROR);
  
  printf("got connection from %s\n", inet_ntoa(rAddr.sin_addr));
  
#ifdef _WIN32
  {
    u_long on = 1;
    i = ioctlsocket (fd2, FIONBIO, &on);
  }
#else
  i = fcntl(fd2, F_SETFL, O_NONBLOCK);
#endif
  assert(i != SOCKET_ERROR);

  printf("accepted socket is nonblocking now\n");
  
  datalen = argc > 1 ? strtol (argv[1], NULL, 0) : 100000000;
  data = (char *) malloc(datalen);
  assert(data);
  printf("buffer size is %lu bytes\n", (unsigned long) datalen);

  datapos = 0;

  while (datapos < datalen)
    {
      fd_set wfds;
      FD_ZERO(&wfds);
      FD_SET(fd2, &wfds);
      
      i = select(fd2 + 1, NULL, &wfds, NULL, NULL);
      assert(i == 1);
      
      printf("trying to write %d bytes\n",
	     (int) (argc > 2 ? 10000 : datalen - datapos));

#if 0 // Same effect as send() on Windows, not available on Unix
      {
	DWORD ret;
	WSABUF iov[1];
	iov[0].buf = data + datapos;
	iov[0].len = argc > 2 ? 10000 : datalen - datapos;
	i = WSASendTo (fd2, iov, 1, &ret, 0, NULL, 0, NULL, NULL);
	if (i != SOCKET_ERROR)
	  i = ret;
      }
#else
      i = send (fd2, data + datapos, argc > 2 ? 10000 : datalen - datapos, 0);
#endif

      if (i == SOCKET_ERROR)
	{
	  printf ("Err: %d\n", WSAGetLastError ());
	  break;
      	}
      else
	printf("%d bytes written\n", i);
      
      
      datapos += i;
      assert(datapos <= datalen);
    }
  shutdown (fd2, SD_BOTH);
  closesocket (fd2);
  printf("hit return to exit ");
  getchar();
  WSACleanup ();
  return 0;
}




[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-16 13:42           ` Corinna Vinschen
@ 2007-12-16 14:07             ` Corinna Vinschen
  2007-12-17 18:24               ` Lev Bishop
  0 siblings, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-16 14:07 UTC (permalink / raw)
  To: cygwin

On Dec 16 14:42, Corinna Vinschen wrote:
>   I'm contemplating the idea to workaround this problem in Cygwin (not
>   for 1.5.25, but in the main trunk) by caping the number of bytes in a
>   single send call, according to the patch Lev sent in
>   http://www.cygwin.com/ml/cygwin-patches/2006-q2/msg00031.html.
> 
>   Lev, are you interested in reworking your patch (minus the pipe stuff)
>   to match current CVS?  Is there any gain in raising SO_SNDBUF/SO_RCVBUF
>   to a value > 8K, especially in the light of my experiences commented
>   on in net.cc, function fdsock()?

Lev, do you have a copyright assignment in place?  I don't find you on
my list of signers.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-16 14:07             ` Corinna Vinschen
@ 2007-12-17 18:24               ` Lev Bishop
  2007-12-17 20:22                 ` Corinna Vinschen
  0 siblings, 1 reply; 20+ messages in thread
From: Lev Bishop @ 2007-12-17 18:24 UTC (permalink / raw)
  To: cygwin

On Dec 16, 2007 9:07 AM, Corinna Vinschen wrote:
> On Dec 16 14:42, Corinna Vinschen wrote:
> >   I'm contemplating the idea to workaround this problem in Cygwin (not
> >   for 1.5.25, but in the main trunk) by caping the number of bytes in a
> >   single send call, according to the patch Lev sent in
> >   http://www.cygwin.com/ml/cygwin-patches/2006-q2/msg00031.html.
> >
> >   Lev, are you interested in reworking your patch (minus the pipe stuff)
> >   to match current CVS?  Is there any gain in raising SO_SNDBUF/SO_RCVBUF
> >   to a value > 8K, especially in the light of my experiences commented
> >   on in net.cc, function fdsock()?
>
> Lev, do you have a copyright assignment in place?  I don't find you on
> my list of signers.

No I don't have a copyright assignment in place yet. I will see what I
can do about that -- don't think it will be a problem. I'd be
interested in reworking the patch against current CVS (though I
haven't looked to see how far current CVS has moved so I don't know
how much that will involve). But I have to warn you in advance that I
haven't had much time to work on this stuff, and I don't see that
situation changing any time soon, so it may take multiple weeks before
I get a chance. (I'll have some time over christmas, but I'll be away
from all my network hardware and the openbsd box I originally used at
the other end of the wire for testing the patches, so testing would be
a problem). If you were hoping to get something into CVS on a more
rigorous timescale, better to push on without me -- I'll still try to
get a copyright assignment submitted, in case you wish to derive from
my original patches.

As far as changing SO_SNDBUF/SO_RCVBUF a few comments, which I
originally wrote in response to your patch in fdsock() but you had
already #ifdef'd out the patch by the time I wrote this, so I never
bothered to send it:
<quote>
Your intention with the patch was to make cygwin's default buffer
sizes be more like on linux, but....
1) On windows/cygwin (without my patch), the interpretation of
so_sndbuf is very different from linux. The afd layer will accept
*any* size of send, so long as the current buffer position is less
than so_sndbuf. Whereas on linux, so_sndbuf limits the total size of
the send buffer. This works nicely for transaction-oriented apps. For
an app which does it's side of a transaction in one large writev() and
then waits for the next request from the client (which will piggyback
the ack the server needs in order to empty it's send buffer), the send
buffer on windows is effectively infinite, for all values of so_sndbuf
except 0. So so_sndbuf cannot really be compared between windows and
linux, because the interpretation is totally different.
2) Linux includes all the overheads of it's skb structures, the part
of the buffer that's given to the application, etc, etc when it
accounts for the memory used by the send buffer, the result of which
is that you can only put about half as much data into the buffer as
there is memory allocated (linux internally doubles the number from
setsockopt(SO_SNDBUF) to hide this from applications expecting BSD
semantics, but it doesn't halve the number from getsockopt() a
longstanding point of controversy). The upshot of this is that the
cygwin default sendbuffer should better be *half* of the linux
tcp_wmem default, if you are going to go that way.
3) Linux does dynamic autotuning on the buffers, so the middle value
in tcp_wmem is more like a hint on what's a convenient chunk of memory
to allocate in one go, rather than a hint on what's actually the best
size for the buffer.
4) Your implementation ignored that some users may have actually
calculated optimal values for their situation and put them in the
relevant registry parameters. It seems it would be best either to:
only set so_{snd,rcv}buf in the case that the registry parameters are
absent; or don't touch so_{snd,rcv}buf at all and just advise users
experiencing problems that the registry parameters have the desired
effect. I'm inclined to go with the latter.
</quote>

Having said all that, the winsock default 8kb really is far too small
for many situations. I find that in my tests (this may be network
hardware/driver dependent) I need 32kb for the stack to start
coalescing packets reliably. Based on this, and on the problems
described in your comments of net.cc fdsock() where the issue was with
64kb buffer size, it seems that 32kb would be a good size to use
(again, it's possibly better to recommend the user to alter his
registry setting to 32kb, rather than have cygwin force it through
setsockopt()).

Before getting too set on the plan of having cygwin break
applications' send()s into chunks, maybe it's worth reconsidering the
overall strategy. We're basically at this point implementing our best
attempt at BSD semantics on top of microsoft's half-assed attempt at
BSD semantics on top of the native not-BSD-like-at-all but powerful
and quite self-consistent NT semantics. If we keep having to work
around more issues like this, perhaps we'd be better off bypassing the
afd layer entirely, by setting SO_SNDBUF to 0, using overlapped IO,
and managing buffers ourselves. I'm sure this would bring it's own set
of complications, but at least we'd be in a better position to deal
with them, not having to go through the afd layer. What do you think?

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-17 18:24               ` Lev Bishop
@ 2007-12-17 20:22                 ` Corinna Vinschen
  2007-12-17 20:29                   ` Corinna Vinschen
  0 siblings, 1 reply; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-17 20:22 UTC (permalink / raw)
  To: cygwin

On Dec 17 12:28, Lev Bishop wrote:
> On Dec 16, 2007 9:07 AM, Corinna Vinschen wrote:
> > On Dec 16 14:42, Corinna Vinschen wrote:
> > >   Lev, are you interested in reworking your patch (minus the pipe stuff)
> > >   to match current CVS?  Is there any gain in raising SO_SNDBUF/SO_RCVBUF
> > >   to a value > 8K, especially in the light of my experiences commented
> > >   on in net.cc, function fdsock()?
> >
> > Lev, do you have a copyright assignment in place?  I don't find you on
> > my list of signers.
> 
> No I don't have a copyright assignment in place yet. I will see what I
> can do about that -- don't think it will be a problem. I'd be
> interested in reworking the patch against current CVS (though I
> haven't looked to see how far current CVS has moved so I don't know
> how much that will involve). But I have to warn you in advance that I
> haven't had much time to work on this stuff, and I don't see that
> situation changing any time soon, so it may take multiple weeks before
> I get a chance [...]

It's not time critical.  The next major release will take some more
time anyway.  If you just could send the copyright assignment, it wou;d
be a good start.

> Having said all that, the winsock default 8kb really is far too small
> for many situations. I find that in my tests (this may be network
> hardware/driver dependent) I need 32kb for the stack to start
> coalescing packets reliably. Based on this, and on the problems
> described in your comments of net.cc fdsock() where the issue was with
> 64kb buffer size, it seems that 32kb would be a good size to use
> (again, it's possibly better to recommend the user to alter his
> registry setting to 32kb, rather than have cygwin force it through
> setsockopt()).

We could just check if the parameter is still set to 8K and only change
it to 32K if so.  Since that already happens at socket creation time,
it doesn't affect later settings in the application anyway, isn't it?

> Before getting too set on the plan of having cygwin break
> applications' send()s into chunks, maybe it's worth reconsidering the
> overall strategy. We're basically at this point implementing our best
> attempt at BSD semantics on top of microsoft's half-assed attempt at
> BSD semantics on top of the native not-BSD-like-at-all but powerful
> and quite self-consistent NT semantics. If we keep having to work
> around more issues like this, perhaps we'd be better off bypassing the
> afd layer entirely, by setting SO_SNDBUF to 0, using overlapped IO,
> and managing buffers ourselves. I'm sure this would bring it's own set
> of complications, but at least we'd be in a better position to deal
> with them, not having to go through the afd layer. What do you think?

Sorry, I'm unfamiliar with the native NT socket interface :} Is there
somewhere a (good) tutorial for the native NT socket stuff?  Even
without using the native API, we could also just set the Winsock
SO_RCVBUF/SO_SNDBUF settings to 0 and intercept the setsockopt/getsockopt
calls to maintain our own buffers, right?

Having said that, for a start I would prefer a simple "upgrade" along
the lines of your previous patch.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: VM and non-blocking writes
  2007-12-17 20:22                 ` Corinna Vinschen
@ 2007-12-17 20:29                   ` Corinna Vinschen
  0 siblings, 0 replies; 20+ messages in thread
From: Corinna Vinschen @ 2007-12-17 20:29 UTC (permalink / raw)
  To: cygwin

On Dec 17 19:24, Corinna Vinschen wrote:
> On Dec 17 12:28, Lev Bishop wrote:
> >  If we keep having to work
> > around more issues like this, perhaps we'd be better off bypassing the
> > afd layer entirely, by setting SO_SNDBUF to 0, using overlapped IO,
> > and managing buffers ourselves. I'm sure this would bring it's own set
> > of complications, [...]
> 
> Sorry, I'm unfamiliar with the native NT socket interface :} Is there
> somewhere a (good) tutorial for the native NT socket stuff?  Even
> without using the native API, we could also just set the Winsock
> SO_RCVBUF/SO_SNDBUF settings to 0 and intercept the setsockopt/getsockopt
> calls to maintain our own buffers, right?

On re-reading, my reply seems a bit off-track.  You're suggesting to use
SO_SNDBUF==0 with overlapped I/O.  I'm asking to keep the standard
nonblocking semantics when maintaining our own per-socket buffer.

At one point the socket stuff was implemented using overlapped I/O, but
I had serious trouble with that.  What happened was that the overlapped
code waited for the socket operation to complete in a WaitForMultipleEvents
call.  When a signal arrived, I canceled the I/O operation using
CancelIO.  The problem was that the send operation is not atomic, and
there is no way to find out how much bytes from the current send buffer
have been actually sent.  So it was not possible to return the correct
number of sent bytes to the application.  Instead the code always
returned with EINTR.  This in turn could result in data corruption or
lost connections.

If there is some way to accomplish that, for instance in the native API,
then we could revert to overlapped I/O.  If not, well...

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-01-07  2:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-13 17:31 VM and non-blocking writes Wayne Christopher
2007-12-13 17:45 ` Dave Korn
2007-12-13 17:59 ` Corinna Vinschen
2007-12-13 19:16   ` Wayne Christopher
2007-12-14 11:15     ` Corinna Vinschen
2007-12-14 13:41       ` Corinna Vinschen
2007-12-14 13:52         ` Corinna Vinschen
2007-12-14 14:35           ` Lev Bishop
2007-12-14 17:25             ` Corinna Vinschen
2007-12-14 21:56               ` mmap failing Wayne Christopher
2007-12-16 11:04                 ` Corinna Vinschen
2007-12-17 18:47                   ` Wayne Christopher
2008-01-07 16:03                 ` Linda Walsh
2007-12-14 14:33       ` VM and non-blocking writes Corinna Vinschen
2007-12-15 17:39         ` Robert Pendell
2007-12-16 13:42           ` Corinna Vinschen
2007-12-16 14:07             ` Corinna Vinschen
2007-12-17 18:24               ` Lev Bishop
2007-12-17 20:22                 ` Corinna Vinschen
2007-12-17 20:29                   ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).