From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21871 invoked by alias); 17 Dec 2007 17:29:20 -0000 Received: (qmail 21859 invoked by uid 22791); 17 Dec 2007 17:29:18 -0000 X-Spam-Check-By: sourceware.org Received: from fk-out-0910.google.com (HELO fk-out-0910.google.com) (209.85.128.191) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 17 Dec 2007 17:29:09 +0000 Received: by fk-out-0910.google.com with SMTP id 18so2622953fkq.2 for ; Mon, 17 Dec 2007 09:29:03 -0800 (PST) Received: by 10.82.106.14 with SMTP id e14mr5635727buc.38.1197912539821; Mon, 17 Dec 2007 09:28:59 -0800 (PST) Received: by 10.82.118.15 with HTTP; Mon, 17 Dec 2007 09:28:59 -0800 (PST) Message-ID: Date: Mon, 17 Dec 2007 18:24:00 -0000 From: "Lev Bishop" To: cygwin@cygwin.com Subject: Re: VM and non-blocking writes In-Reply-To: <20071216140719.GF18860@calimero.vinschen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <47616D31.7090002@4raccoons.com> <20071213175934.GB25863@calimero.vinschen.de> <476185AF.5000906@4raccoons.com> <20071214111508.GD25863@calimero.vinschen.de> <20071214143230.GK25863@calimero.vinschen.de> <20071216134221.GD18860@calimero.vinschen.de> <20071216140719.GF18860@calimero.vinschen.de> Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2007-12/txt/msg00394.txt.bz2 On Dec 16, 2007 9:07 AM, Corinna Vinschen wrote: > On Dec 16 14:42, Corinna Vinschen wrote: > > I'm contemplating the idea to workaround this problem in Cygwin (not > > for 1.5.25, but in the main trunk) by caping the number of bytes in a > > single send call, according to the patch Lev sent in > > http://www.cygwin.com/ml/cygwin-patches/2006-q2/msg00031.html. > > > > Lev, are you interested in reworking your patch (minus the pipe stuff) > > to match current CVS? Is there any gain in raising SO_SNDBUF/SO_RCVBUF > > to a value > 8K, especially in the light of my experiences commented > > on in net.cc, function fdsock()? > > Lev, do you have a copyright assignment in place? I don't find you on > my list of signers. No I don't have a copyright assignment in place yet. I will see what I can do about that -- don't think it will be a problem. I'd be interested in reworking the patch against current CVS (though I haven't looked to see how far current CVS has moved so I don't know how much that will involve). But I have to warn you in advance that I haven't had much time to work on this stuff, and I don't see that situation changing any time soon, so it may take multiple weeks before I get a chance. (I'll have some time over christmas, but I'll be away from all my network hardware and the openbsd box I originally used at the other end of the wire for testing the patches, so testing would be a problem). If you were hoping to get something into CVS on a more rigorous timescale, better to push on without me -- I'll still try to get a copyright assignment submitted, in case you wish to derive from my original patches. As far as changing SO_SNDBUF/SO_RCVBUF a few comments, which I originally wrote in response to your patch in fdsock() but you had already #ifdef'd out the patch by the time I wrote this, so I never bothered to send it: Your intention with the patch was to make cygwin's default buffer sizes be more like on linux, but.... 1) On windows/cygwin (without my patch), the interpretation of so_sndbuf is very different from linux. The afd layer will accept *any* size of send, so long as the current buffer position is less than so_sndbuf. Whereas on linux, so_sndbuf limits the total size of the send buffer. This works nicely for transaction-oriented apps. For an app which does it's side of a transaction in one large writev() and then waits for the next request from the client (which will piggyback the ack the server needs in order to empty it's send buffer), the send buffer on windows is effectively infinite, for all values of so_sndbuf except 0. So so_sndbuf cannot really be compared between windows and linux, because the interpretation is totally different. 2) Linux includes all the overheads of it's skb structures, the part of the buffer that's given to the application, etc, etc when it accounts for the memory used by the send buffer, the result of which is that you can only put about half as much data into the buffer as there is memory allocated (linux internally doubles the number from setsockopt(SO_SNDBUF) to hide this from applications expecting BSD semantics, but it doesn't halve the number from getsockopt() a longstanding point of controversy). The upshot of this is that the cygwin default sendbuffer should better be *half* of the linux tcp_wmem default, if you are going to go that way. 3) Linux does dynamic autotuning on the buffers, so the middle value in tcp_wmem is more like a hint on what's a convenient chunk of memory to allocate in one go, rather than a hint on what's actually the best size for the buffer. 4) Your implementation ignored that some users may have actually calculated optimal values for their situation and put them in the relevant registry parameters. It seems it would be best either to: only set so_{snd,rcv}buf in the case that the registry parameters are absent; or don't touch so_{snd,rcv}buf at all and just advise users experiencing problems that the registry parameters have the desired effect. I'm inclined to go with the latter. Having said all that, the winsock default 8kb really is far too small for many situations. I find that in my tests (this may be network hardware/driver dependent) I need 32kb for the stack to start coalescing packets reliably. Based on this, and on the problems described in your comments of net.cc fdsock() where the issue was with 64kb buffer size, it seems that 32kb would be a good size to use (again, it's possibly better to recommend the user to alter his registry setting to 32kb, rather than have cygwin force it through setsockopt()). Before getting too set on the plan of having cygwin break applications' send()s into chunks, maybe it's worth reconsidering the overall strategy. We're basically at this point implementing our best attempt at BSD semantics on top of microsoft's half-assed attempt at BSD semantics on top of the native not-BSD-like-at-all but powerful and quite self-consistent NT semantics. If we keep having to work around more issues like this, perhaps we'd be better off bypassing the afd layer entirely, by setting SO_SNDBUF to 0, using overlapped IO, and managing buffers ourselves. I'm sure this would bring it's own set of complications, but at least we'd be in a better position to deal with them, not having to go through the afd layer. What do you think? -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/