From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 50678 invoked by alias); 1 Jun 2015 15:03:48 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 50646 invoked by uid 89); 1 Jun 2015 15:03:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mout.gmx.net Received: from mout.gmx.net (HELO mout.gmx.net) (212.227.17.20) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 01 Jun 2015 15:03:44 +0000 Received: from [20.133.40.85] by msvc-mesg-gmx006.server.lan (via HTTP); Mon, 1 Jun 2015 17:03:40 +0200 MIME-Version: 1.0 Message-ID: From: Theodor.Kazhan@gmx.de To: Theodor.Kazhan@gmx.de, cygwin@cygwin.com Subject: Re: cygwin on W7: stalled scp (openssh-6.8p1), lost ssh-scp-pipe data Content-Type: text/plain; charset=UTF-8 Date: Mon, 01 Jun 2015 15:03:00 -0000 Content-Transfer-Encoding: quoted-printable X-UI-Out-Filterresults: notjunk:1; X-SW-Source: 2015-06/txt/msg00008.txt.bz2 Hi folks, unfortunately, my issue seems to be not interesting enough to trigger someo= nes attention... ;) Was my issue well described or do you have further questions? I'd appreciat= e any help leading me forward, e.g. maybe some hints in howto instrument cy= gwin sources to be able to further trace the data through and debug into th= e cygwins write / read functions... Thanks, Theo -----Urspr=C3=BCngliche Nachricht----- Gesendet: Freitag, 22 Mai 2015 um 09:52:20 Uhr Von: Theodor.Kazhan@gmx.de An: cygwin@cygwin.com Betreff: cygwin on W7: stalled scp (openssh-6.8p1), lost ssh-scp-pipe data Hi ML-followers, I occasionally observe stalled scp connections while copying logfile archiv= es from a debian server to cygwin on Windows 7. Besides https://sourceware.= org/ml/cygwin/2015-02/msg00575.html I did not find similar issues, but ther= e is no blocking/non-blocking switch involved in the middle of the data tra= nsmission (at random places in subsequent tests) through the pipe, where so= me data is lost (pls see below). Pls find the redacted output of my "cygcheck -svr" attached. I also noticed= that issue also before upgrading my cygwin to that latest version. As I'm not into the details of read/write functions, I was only able to tra= ck down the issue as follows - I hope it is reproducible or otherwise helpf= ul - so now I need your help: 1) I added some instrumentation in scp/ssh (both binaries copied to /usr/lo= cal/bin/ and used below): user@host /usr/src/openssh-6.8p1-1.src/openssh-6.8p1-debug $ grep -n -C4 TK: *.c channels.c-1724- channels.c-1725- if (c->datagram) { channels.c-1726- /* ignore truncated writes, datagra= ms might get lost */ channels.c-1727- len =3D write(c->wfd, buf, dlen); channels.c:1728:logit("TK: channels.c: channel_handle_wfd(1): write: len=3D= %d, buf=3D%.16s", len, buf); channels.c-1729- free(data); channels.c-1730- if (len < 0 && (errno =3D=3D EINTR = || errno =3D=3D EAGAIN || channels.c-1731- errno =3D=3D EWOULDBLOCK)) channels.c-1732- return 1; -- channels.c-1745- dlen =3D MIN(dlen, 8*1024); channels.c-1746-#endif channels.c-1747- channels.c-1748- len =3D write(c->wfd, buf, dlen); channels.c:1749:logit("TK: channels.c: channel_handle_wfd(2): write: len=3D= %d, buf=3D%.16s", len, buf); channels.c-1750- if (len < 0 && channels.c-1751- (errno =3D=3D EINTR || errno =3D=3D EAG= AIN || errno =3D=3D EWOULDBLOCK)) channels.c-1752- return 1; channels.c-1753- if (len <=3D 0) { -- channels.c-2368- return 0; channels.c-2369- channels.c-2370- /* Get the data. */ channels.c-2371- data =3D packet_get_string_ptr(&data_len); channels.c:2372:logit("TK: channels.c: channel_input_data: data_len=3D%d, b= uf=3D%.16s", data_len, data); channels.c-2373- win_len =3D data_len; channels.c-2374- if (c->datagram) channels.c-2375- win_len +=3D 4; /* string length header */ channels.c-2376- -- dispatch.c-107- return r; dispatch.c-108- if (type =3D=3D SSH_MSG_NONE) dispatch.c-109- return 0; dispatch.c-110- } dispatch.c:111:logit("TK: dispatch.c: ssh_dispatch_run: type=3D%d", type); dispatch.c-112- if (type > 0 && type < DISPATCH_MAX && dispatch.c-113- ssh->dispatch[type] !=3D NULL) { dispatch.c-114- if (ssh->dispatch_skip_packets) { dispatch.c-115- debug2("skipped packet (type %u)", = type); -- scp.c-1110- count +=3D amt; scp.c-1111- do { scp.c-1112- j =3D atomicio6(read, remin, cp, am= t, scp.c-1113- scpio, &statbytes); scp.c:1114:logit("TK: scp.c: sink: amt=3D%d, j=3D%d, i=3D%d, size=3D%d, cou= nt=3D%d, buf=3D%.16s", amt, j, i, size, count, cp); scp.c-1115- if (j =3D=3D 0) { scp.c-1116- run_err("%s", j !=3D EPIPE ? scp.c-1117- strerror(errno) : scp.c-1118- "dropped connection"); 2) Generated easily debuggable (numbered 16k ascii blocks) HUGEFILE on a de= bian server (file size taken from an original log archive): $ for BLOCK in `seq -w 1 1 94795`; do for ELEM in `seq -w 1 1 1024`; do ech= o -n "B0$BLOCK-N00$ELEM-"; done; done > TEST.txt 3) Pulled that file in a loop from the debian server to the local cygwin on= Windows 7: $ while true; do /usr/local/bin/scp -v -C -o BatchMode=3Dyes user@10.IP2.IP= 3.IP4:logDownload_m/TEST.txt 93_KO_TEST.txt 2>&1 | tee 93_KO_scp_trace.txt;= echo; date; echo; sleep 15; done 4) In case of stalled scp (in my env, it does not take that much time to ge= t it, maybe 10-to-30 tries), collect and compare the "KO" output to a previ= ously recorded and prepared "OK" output: $ split -b 163840 -a 5 93_KO_TEST.txt 93_KO_TEST.txt.split_ $ for FILE in 93_OK_TEST.txt.split_*; do echo "$FILE"; diff -q $FILE `echo = $FILE | sed -re "s/93_OK_/93_KO_/g"` || break; done . . . 93_OK_TEST.txt.split_aanqv 93_OK_TEST.txt.split_aanqw Files 93_OK_TEST.txt.split_aanqw and 93_KO_TEST.txt.split_aanqw differ $ sdiff <(cat 93_OK_TEST.txt.split_aanqw | sed -re "s/(B......-N001024-)/\1= \n/g") <(cat 93_KO_TEST.txt.split_aanqw | sed -re "s/(B......-N001024-)/\1\= n/g") B092261-N000001-B092261-N000002-B092261-N000003-B092261-N0000 B092261-N00= 0001-B092261-N000002-B092261-N000003-B092261-N0000 B092262-N000001-B092262-N000002-B092262-N000003-B092262-N0000 B092262-N00= 0001-B092262-N000002-B092262-N000003-B092262-N0000 B092263-N000001-B092263-N000002-B092263-N000003-B092263-N0000 B092263-N00= 0001-B092263-N000002-B092263-N000003-B092263-N0000 B092264-N000001-B092264-N000002-B092264-N000003-B092264-N0000 B092264-N00= 0001-B092264-N000002-B092264-N000003-B092264-N0000 B092265-N000001-B092265-N000002-B092265-N000003-B092265-N0000 B092265-N00= 0001-B092265-N000002-B092265-N000003-B092265-N0000 B092266-N000001-B092266-N000002-B092266-N000003-B092266-N0000 B092266-N00= 0001-B092266-N000002-B092266-N000003-B092266-N0000 B092267-N000001-B092267-N000002-B092267-N000003-B092267-N0000 < B092268-N000001-B092268-N000002-B092268-N000003-B092268-N0000 B092268-N00= 0001-B092268-N000002-B092268-N000003-B092268-N0000 B092269-N000001-B092269-N000002-B092269-N000003-B092269-N0000 B092269-N00= 0001-B092269-N000002-B092269-N000003-B092269-N0000 B092270-N000001-B092270-N000002-B092270-N000003-B092270-N0000 B092270-N00= 0001-B092270-N000002-B092270-N000003-B092270-N0000 > B092271-N00= 0001-B092271-N000002-B092271-N000003-B092271-N0000 ... So, the 16k block "B092267" is missing in the "KO" case, lets check the= scp/ssh logs, where we do have 3 steps where the HUGEFILE data is handled: a) Writing it to the channel buffer in the ssh process after reading it fro= m the socket and after decryption and decompression: "TK: channels.c: chann= el_input_data:" b) Reading it from the channel buffer and writing it to the pipe towards th= e scp parent process in ssh process: "TK: channels.c: channel_handle_wfd(2)= :" c) Reading it from the pipe in the scp process to write it to the target fi= le: "TK: scp.c: sink:" The logs show that the buffer is handled in a) and b), but does not show up= in c), it is simply lost... $ for TRACKER in TK:.channels.c:.channel_input_data: TK:.channels.c:.channe= l_handle_wfd.2.: TK:.scp.c:.sink:; do echo; cat 93_KO_scp_trace.txt | grep = "$TRACKER" | grep "B0922[67]"; done TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092260-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092261-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092262-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092263-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092264-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092265-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092266-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092267-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092268-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092269-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092270-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092271-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092272-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092273-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092274-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092275-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092276-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092277-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092278-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB092279-N000001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092260-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092261-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092262-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092263-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092264-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092265-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092266-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092267-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092268-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092269-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092270-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092271-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092272-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092273-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092274-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092275-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D32768, buf=3DB092276-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092278-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB092279-N0= 00001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D1511587840, size=3D1553121293,= count=3D65536, buf=3DB092261-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D1511653376, size=3D1553121293,= count=3D65536, buf=3DB092265-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D1511718912, size=3D1553121293,= count=3D65536, buf=3DB092270-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D1511784448, size=3D1553121293,= count=3D65536, buf=3DB092274-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D1511849984, size=3D1553121293,= count=3D65536, buf=3DB092278-N000001- The 3rd line of c) "TK: scp.c: sink:" should be "buf=3DB092269-N000001-" if= all of the 64k data of the 2nd line "count=3D65536, buf=3DB092265-N000001-= " would have been read correctly from the pipe, but it is: "buf=3DB092270-N= 000001-", i.e. one 16k block is written to the pipe by ssh ("TK: channels.c= : channel_handle_wfd(2): write: len=3D16384, buf=3DB092267-N000001-"), but = does not show up on teh other end of the pipe in scp. In case bigger blocks are written to the pipe by ssh (e.g. "len=3D32768"), = the bigger blocks get lost and does not show up in scp, e.g.: 93_KOOK_TEST.txt/93_OK_TEST.txt.split_aacmz VS 92_KO_TEST.txt/92_KO_TEST.tx= t.split_aacmz B016891-N000001-B016891-N000002-B016891-N000003-B016891-N0000 < B016892-N000001-B016892-N000002-B016892-N000003-B016892-N0000 < B016893-N000001-B016893-N000002-B016893-N000003-B016893-N0000 B016893-N00= 0001-B016893-N000002-B016893-N000003-B016893-N0000 . . .=20 B016900-N000001-B016900-N000002-B016900-N000003-B016900-N0000 B016900-N00= 0001-B016900-N000002-B016900-N000003-B016900-N0000 > B016901-N00= 0001-B016901-N000002-B016901-N000003-B016901-N0000 > B016902-N00= 0001-B016902-N000002-B016902-N000003-B016902-N0000 92_KO_TEST.txt/92_KO_scp_trace.txt TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB016890-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB016891-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB016892-N000001- TK: channels.c: channel_input_data: data_len=3D16384, buf=3DB016893-N000001- TK: channels.c: channel_handle_wfd(2): write: len=3D16384, buf=3DB016890-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D32768, buf=3DB016891-N0= 00001- TK: channels.c: channel_handle_wfd(2): write: len=3D32768, buf=3DB016893-N0= 00001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D276627456, size=3D1553121293, = count=3D65536, buf=3DB016885-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D276692992, size=3D1553121293, = count=3D65536, buf=3DB016889-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D276758528, size=3D1553121293, = count=3D65536, buf=3DB016895-N000001- TK: scp.c: sink: amt=3D65536, j=3D65536, i=3D276824064, size=3D1553121293, = count=3D65536, buf=3DB016899-N000001- Any help is much appreciated, thanks in advance! T. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple