From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) by sourceware.org (Postfix) with ESMTPS id 749CB3846078 for ; Tue, 6 Apr 2021 14:50:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 749CB3846078 Received: by mail-lf1-x133.google.com with SMTP id d13so23210955lfg.7 for ; Tue, 06 Apr 2021 07:50:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:references:in-reply-to:subject:date :message-id:mime-version:content-language:thread-index; bh=G2TCLkQBAT0RGd7n/iyqb+CF5K4dJM/YN+UPFbVWtug=; b=i8JR7SpdOCImdeIrL0efU0/3iG90K10V58/wNsvF/KcSWedlKfPIrKiDZXQKIzDxgA 6f951wEnP51xHlOXW5TpPsFs9zBUBSz3+/r7YLH6jIgoU99EOojvGhm/vsAOg8FLGx49 3gI0xYllrIx/FiuBgIN7HK/No8j+38Vvr2tKU9KJJgBkvJaacHFcVSVqTFSHD0FKs/3r CPZa0DR/vm633CKSsj1ddLSDozzSH03DMOM2w2yMTmIyrZ8Q5jOE5BOgp3aOnV9P3NOf HzpA0PmpfT5Rj8++pTwgzwE3DiZYxy25keihAUrPNwyG4vDLoaDe9HsMa+TFwaS2LJdH klkQ== X-Gm-Message-State: AOAM531HVbC5MpCPd/Q5zGV/ll44IDdSKkSt4YTe8rDM13tyaOK55/sa MNrUAyTwCyOEN6TnSrd8orAWrN5ezYOOpA== X-Google-Smtp-Source: ABdhPJxv1sCZxk1Jg6hO0gBidpDX6n34q0kXrJz1j44PBj+5cM931FxJQeBOeAOxQdEGYXTAOabHTQ== X-Received: by 2002:a19:5219:: with SMTP id m25mr20211526lfb.416.1617720655740; Tue, 06 Apr 2021 07:50:55 -0700 (PDT) Received: from zingo (87-249-172-112.ljusnet.se. [87.249.172.112]) by smtp.gmail.com with ESMTPSA id f17sm2181951lfh.21.2021.04.06.07.50.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Apr 2021 07:50:55 -0700 (PDT) From: To: "'Ken Brown'" , References: <04cc01d71ffa$7d1e6cf0$775b46d0$@gmail.com> <00d901d7208e$97c05c50$c74114f0$@gmail.com> <860668bf-8cf9-0969-6a01-7fbf8b782db1@cornell.edu> <000901d72607$55dc5a90$01950fb0$@gmail.com> <3346cd1c-b93f-83c4-ff26-553ac95ec692@cornell.edu> <7c21a430-9609-7fd4-1a02-8b7c1978d2f8@cornell.edu> In-Reply-To: <7c21a430-9609-7fd4-1a02-8b7c1978d2f8@cornell.edu> Subject: RE: AF_UNIX/SOCK_DGRAM is dropping messages Date: Tue, 6 Apr 2021 16:50:56 +0200 Message-ID: <001901d72af4$4009cd50$c01d67f0$@gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_001A_01D72B05.039360A0" X-Mailer: Microsoft Outlook 16.0 Content-Language: en-se Thread-Index: AQIPffBCgY7dkx32YYBd4buxXBOzegICwCl2At957CQCAh4QbgK/qZQ0Aiflzi4DDsW9uqnACXHg X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Apr 2021 14:50:59 -0000 This is a multipart message in MIME format. ------=_NextPart_000_001A_01D72B05.039360A0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable > >>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0) seems to > >>>>>> drop messages or at least they are not received in the same = order > >>>>>> they are sent > >> > >> [snip] > >> > >>> Thanks for the test case. I can confirm the problem. I'm not > >>> familiar enough with the current AF_UNIX implementation to debug > >>> this easily. I'd rather spend my time on the new implementation = (on > >>> the topic/af_unix branch). It turns out that your test case fails > >>> there too, but in a completely different way, due to a bug in = sendto > >>> for datagrams. I'll see if I can fix that bug and then try again. > >>> > >>> Ken > >> > >> Ok, too bad it wasn't our own code base but good that the "mystery" > >> is verified > >> > >> I finally succeed to build topic/af_unix (after finding out what > >> version of zlib was needed), but not with -D__WITH_AF_UNIX to > >> CXXFLAGS though and thus I haven=E2=80=99t tested it yet > >> > >> Is it sufficient to add the define to the "main" Makefile or do you > >> have to add it to all the Makefile:s ? I guess I can find out = though > > > > I do it on the configure line, like this: > > > > ../af_unix/configure CXXFLAGS=3D"-g -O0 -D__WITH_AF_UNIX" = --prefix=3D... > > > >> Is topic/af_unix fairly up to date with master branch ? > > > > Yes, I periodically cherry-pick commits from master to = topic/af_unix. > > I'lldo that again right now. > > > >> Either way, I'll be glad to help out testing topic/af_unix > > > > Thanks! >=20 > I've now pushed a fix for that sendto bug, and your test case runs = without > error on the topic/af_unix branch. It seems like the test-case do work now with topic/af_unix in blocking = mode, but when using non-blocking (with MSG_DONTWAIT) there are some = issues I think 1. When the queue is empty with non-blocking recv(), errno is set to = EPIPE but I think it should be EAGAIN (or maybe the pipe is getting = broken for real of some reason ?) 2. When using non-blocking recv() and no message is written at all, it = seems like recv() blocks forever 3. Using non-blocking recv() where the "client" does send less than = "count" messages, sometimes recv() blocks forever (as well) My na=C3=AFve analysis of this is that for the first issue (if any) the = wrong errno is set and for the second issue it blocks if no sendto() is = done after the first recv(), i.e. nothing kicks the "reader thread" in = the butt to realise the queue is empty. It is not super clear though = what POSIX says about creating blocking descriptors and then using = non-blocking-flags with recv(), but this works in Linux any way Let me know if I should provide more a specific explanation, but I think = minor modifications of the test-case can provoke all behaviours. I think = 2 and 3 are of the same reason though (as described above) > By the way, I think the implementation of sendto/recv for datagrams is = very > inefficient when there are repeated calls to sendto as in your test = case. > Nevertheless, your test case actually runs slightly faster on the = topic/af_unix > branch than it does on master (when the latter succeeds, which it does = about > half the time for me). So I'm not sure whether it's worth worrying = about this. Of course we would like the best throughput possible =F0=9F=98=89 > Here's the issue, briefly. The communication is done via a Windows = named > pipe. > The receiver creates the pipe when it creates and binds its socket. = It creates > only one pipe instance. The sender connects to the pipe, writes, and = closes its > handle. But the pipe is not available for another sender to connect = to until the > receiver reads the message, after which it disconnects the sender. Ok, in our application we will use long lived descriptors and multiple = writers that possible send large business messages (chunked into some = smaller pieces per sendto()/recv()) > Ken[Kristian]=20 Best regards, Kristian ------=_NextPart_000_001A_01D72B05.039360A0 Content-Type: text/plain; name="af_unix.cpp" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="af_unix.cpp" #include #include #undef AF_UNIX #define AF_UNIX 31 #include #include #include #include #include #include // $ g++ --std=3Dgnu++17 af_unix.cpp const char* const path =3D "address"; const int count =3D 10000; const int size =3D BUFSIZ * 8; int client() { const int fd =3D socket( AF_UNIX, SOCK_DGRAM, 0); if( fd =3D=3D -1) { perror( "socket error"); return -1; } struct sockaddr_un address{}; strcpy( address.sun_path, path); address.sun_family =3D AF_UNIX; char buffer[size] =3D {}; for( int idx =3D 0; idx < 100; ++idx) { memcpy( buffer, &idx, sizeof idx); const ssize_t result =3D sendto( fd, buffer, size, 0, (struct = sockaddr*)&address, sizeof address); // Assume the whole chunk can be sent if( result !=3D size) { perror( "sendto error"); return -1; } } close( fd); return 0; } int server() { const int fd =3D socket( AF_UNIX, SOCK_DGRAM, 0); if( fd =3D=3D -1) { perror( "socket error"); return -1; } struct sockaddr_un address{}; strcpy( address.sun_path, path); address.sun_family =3D AF_UNIX; const int result =3D bind( fd, (struct sockaddr*)&address, sizeof = address); if( result =3D=3D -1) { perror( "bind error"); return -1; } return fd; } int main( int argc, char* argv[]) { const int fd =3D server( ); if( fd !=3D -1) { fprintf( stdout, "%d\tnumber of packages\n", count); fprintf( stdout, "%d\tbytes per package\n", size); std::thread{ [&](){client( );}}.detach(); std::this_thread::sleep_for( std::chrono::microseconds( 500)); =20 char buffer[size] =3D {}; for( int idx =3D 0; idx < count; ++idx) { const ssize_t result =3D recv( fd, buffer, size, = MSG_DONTWAIT); // Assume the whole chunk can be read if( result !=3D size) { perror("recv error"); //fprintf( stderr, "index: %d\n", idx); unlink( path); return -1; } int index =3D 0; memcpy( &index, buffer, sizeof idx); if( index !=3D idx) { fprintf( stderr, "expected %d but got %d\n", idx, = index); unlink( path); return -1; } } close( fd); unlink( path); } return 0; } ------=_NextPart_000_001A_01D72B05.039360A0--