From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id DAAF73858D39 for ; Tue, 18 Jan 2022 17:09:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DAAF73858D39 Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-637-y7UKVVWEMKWHx607kXNW-w-1; Tue, 18 Jan 2022 12:09:17 -0500 X-MC-Unique: y7UKVVWEMKWHx607kXNW-w-1 Received: by mail-wm1-f72.google.com with SMTP id n25-20020a05600c3b9900b00348b83fbd0dso2526769wms.0 for ; Tue, 18 Jan 2022 09:09:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=uE92G7nnUwY/Ere9ySmYCrXOIuZKm5tcyXQ5qPy6FzY=; b=iofK2DZgmUOQX+JfecYZFSOpPCkon6R1z/+2B7w9Vfw8WBJsJWS4NsIZDVPCdO7SMk QTFLAVChxcz/zlMTUauUvAumB/vv7EJgdXE7zGm4W1hzBHadr7SwSNqQH9dOnClUiPJc FtSHr6Tn0SfzL/Cx3AE8pXrpa1YFgVIGkjSh0cVk5P/rjQlW5j3wRBFYGBkBnu8AbPi6 3GO3lWBNIUEdHYMSK95RB11Rlx1K3M0g8+vo128Q5C+3/R4xrMd7jSJsriKFZ+cXm2H1 mS2rA5JPeQCHlwjfTK+WyBN43fw6zOGPgy3CbP5hRv85qSNu2Z/NCg19pXyg5YzBnhGY oyIw== X-Gm-Message-State: AOAM532Ej/KVDXCEt3XGxuiKL0ybgu619niHqbHxRiGdjiSVXiFISum+ mwf6+5xWKoYG5BMx6XUgyqYexH1Z6UL74yPs5B/Y51b6yx8lc8KWw/kl8Xrh3Id1VP845TKOyRG s0YbRUkrcTN2mxxBSCJJafQ== X-Received: by 2002:a05:6000:15c9:: with SMTP id y9mr23749940wry.689.1642525755599; Tue, 18 Jan 2022 09:09:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJxYtpxu5t6AgipIS7Aw2j6BGkdoFgUVJj9QSbZpal2EpJyQHc+JhI+BAJN3cqzPgdrcvTWu5Q== X-Received: by 2002:a05:6000:15c9:: with SMTP id y9mr23749912wry.689.1642525755156; Tue, 18 Jan 2022 09:09:15 -0800 (PST) Received: from localhost (host86-188-49-82.range86-188.btcentralplus.com. [86.188.49.82]) by smtp.gmail.com with ESMTPSA id e10sm20703501wrq.40.2022.01.18.09.09.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jan 2022 09:09:14 -0800 (PST) Date: Tue, 18 Jan 2022 17:09:13 +0000 From: Andrew Burgess To: Simon Marchi Cc: gdb-patches@sourceware.org Subject: Re: [PATCH] gdb: unbuffer all input streams when not using readline Message-ID: <20220118170913.GH622389@redhat.com> References: <20220117164051.1854133-1-aburgess@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Operating-System: Linux/5.8.18-100.fc31.x86_64 (x86_64) X-Uptime: 16:48:45 up 17 days, 1:42, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jan 2022 17:09:26 -0000 * Simon Marchi via Gdb-patches [2022-01-18 11:26:42 -0500]: > > > > > On 2022-01-17 11:40, Andrew Burgess via Gdb-patches wrote: > > > This commit should fix PR gdb/28711. What's actually going on is > > > pretty involved, and there's still a bit of the story that I don't > > > understand completely, however, from my observed results, I think that > > > the change I propose making here (or something very similar) is going > > > to be needed. > > > > > > The original bug report involves using eclipse to drive gdb using mi > > > commands. A separate tty is spun off in which to send gdb the mi > > > commands, this tty is created using the new-ui command. > > > > > > The behaviour observed is that, given a particular set of mi commands > > > being sent to gdb, we sometimes see an ESPIPE error from a lseek > > > call, which ultimately results in gdb terminating. > > > > > > The problems all originate from gdb_readline_no_editing_callback in > > > gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and > > > allow glibc to perform buffering on the FILE object being used. > > > > > > I say sometime, because, gdb_readline_no_editing_callback already > > > includes a call to disable the glibc buffering, but this is only done > > > if the input stream is not a tty. In our case the input stream is a > > > tty, so the buffering is left in place. > > > > > > The first step to understanding why this problem occurs is to > > > understand that eclipse sends multiple commands to gdb very quickly > > > without waiting for and answer to each command, eclipse plans to > > > collect all of the command results after sending all the commands to > > > gdb. In fact, eclipse sends the commands to gdb that they appear to > > > arrive in the gdb process as a single block of data. When reproducing > > > this issue within the testsuite I find it necessary to send multiple > > > commands using a single write call. > > > > > > The next bit of the story gets a little involved, and this is where my > > > understanding is not complete. I can describe the behaviour that I > > > observe, and (for me at least) I'm happy that what I'm seeing, if a > > > little strange, is consistent. In order to fully understand what's > > > going on I think I would likely need to dive into kernel code, which > > > currently seems unnecessary given that I'm happy with the solution I'm > > > proposing. > > > > > > The following description all relates to input from a tty in which I'm > > > not using readline. I see the same problems either when using a > > > new-ui tty, or with gdb's standard, non-readline, mi tty. > > > > > > Here's what I observe happening when I send multiple commands to gdb > > > using a single write, if I send gdb this: > > > > > > command_1\ncommand_2\ncommand_3 > > > > > > then gdb's event loop will wake up (from its select) as it sees there > > > is input available. We call into gdb_readline_no_editing_callback, > > > where we call fgetc, glibc will do a single big read, and get back > > > just: > > > > > > command_1\n > > > > > > that is, despite there being multiple lines of input available, I > > > consistently get just a single line. From glibc a single character is > > > returned from the fgetc call, and within gdb we accumulate characters, > > > one at a time, into our own buffer. Eventually gdb sees the '\n' > > > character, and dispatches the whole 'command_1' into gdb's command > > > handler, which processes the command and prints the result. We then > > > return to gdb_readline_no_editing_callback, which in turn returns to > > > gdb's event loop where we re-enter the select. > > > > > > Inside the select we immediately see that there is more input waiting > > > on the input stream, drop out of the select, and call back into > > > gdb_readline_no_editing_callback. In this function we again call > > > fgetc where glibc performs another big read. This time glibc gets: > > > > > > command_2\n > > > > > > that is, we once again get just a single line, despite there being a > > > third line available. Just like the first command we copy the whole > > > string, character by character into gdb's buffer, then handle the > > > command. After handling the command we go to the event loop, enter, > > > and then exit the select, and call back to the function > > > gdb_readline_no_editing_callback. > > > > > > In gdb_readline_no_editing_callback we again call fgetc, this time > > > glibc gets the string: > > > > > > command_3\n > > > > > > like before, we copy this to gdb's buffer and handle the command, then > > > we return to the event loop. At this point the select blocks while we > > > wait for more input to arrive. > > > > > > The important bit of this is that someone, somewhere is, it appears, > > > taking care to split the incoming write into lines. > > > > > > My next experiment is to try something like: > > > > > > this_is_a_very_long_command\nshort_command\n > > > > > > However, I actually make 'this_is_a_very_long_command' very long, as > > > in many hundreds of characters long. One way to do this is: > > > > > > echo xxxxxx.....xxxxx > > > > > > and just adding more and more 'x' characters as needed. What I'm > > > aiming for is to have the first command be longer than glibc's > > > internal read buffer, which, on my machine, is 1024 characters. > > > > > > However, for this discussion, lets imagine that glibc's buffer is just > > > 8 characters (we can create just this situation by adding a suitable > > > setbuf call into gdb_readline_no_editing_callback). > > > > > > Now, if I send gdb this data: > > > > > > abcdefghij\nkl\n > > > > > > The first read from glibc will get 'abcdefgh', that is a full 8 > > > character buffer. Once gdb has copied these to its buffer we call > > > fgetc again, and now glibc will get 'ij\n', that is, just like before, > > > multiple lines are split at the '\n' character. The full command, > > > which is now in gdb's buffer can be handled 'abcdefghij', after which > > > we go (via the event loop) back to gdb_readline_no_editing_callback. > > > Now we call fgetc, and glibc will get 'kl\n', which is then handled in > > > the normal way. > > > > > > So far, so good. However, there is, apparently, one edge case where > > > the above rules don't apply. > > > > > > If the '\n' character is the first character read from the kernel, > > > then the incoming lines are not split up. So, given glibc's 8 > > > character buffer, if I send gdb this: > > > > > > abcdefgh\nkl\n > > > > > > that is the first command is 8 characters plus a newline, then, on the > > > first read (from within glibc) we get 'abcdefgh' in a single buffer. > > > As there's no newline gdb calls fgetc again, and glibc does another > > > large read, now we get: > > > > > > \nkl\n > > > > > > which doesn't follow the above pattern - the lines are not split into > > > separate buffers! > > > > > > So, gdb reads the first character from glibc using fgetc, this is the > > > newline. Now gdb has a complete command, and so the command is > > > handled. We then return to the event loop and enter the select. > > > > > > The problem is that, as far as the kernel is concerned, there is no > > > more input pending, it's all been read into glibc's buffer, and so the > > > select doesn't return. The second command is basically stuck in > > > glibc's buffer. > > > > > > If I send another command to gdb, or even just send an empty > > > command (a lone newline) then the select returns, we call into > > > gdb_readline_no_editing_callback, and now gdb sees the second > > > command. > > > > > > OK, so the above is interesting, but it doesn't explain the ESPIPE > > > error. > > > > > > Well, that's a slightly different, but related issue. The ESPIPE > > > case will _only_ show up when using new-ui to create the separate tty > > > for mi commands, and is a consequence of this commit: > > > > > > commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8 > > > Date: Thu Jul 18 17:20:04 2019 +0100 > > > > > > Fix for using named pipes on Windows > > > > > > Prior to this commit, the new-ui command would open the tty three > > > times, once each for stdin, stderr, and stdout. After this commit we > > > open the tty just once and reuse the FILE object for all three roles. > > > > > > Consider the problem case, where glibc has (unexpectedly) read the > > > second command into its internal buffer. When we handle the first > > > command we usually end up having to write something to the mi output > > > stream. > > > > > > After the above commit the same FILE object represents both the input > > > and output streams, so, when gdb tries to write to the FILE object, > > > glibc spots that there is input pending within the input buffer, and > > > so assumes that we have read ahead of where we should be in the input > > > file. To correct for this glibc tries to do an lseek call to > > > reposition the file offset of the output stream prior to writing to > > > it. However, as the output stream is a tty, and seeking is not > > > supported on a tty, this lseek call fails, this results in the ESPIPE, > > > which ultimately causes gdb to terminate. > > > > > > So, now we understand why the ESPIPE triggers (which was what caused > > > the gdb crash in the original bug report), and we also understand that > > > sometime gdb will not handle the second command in a timely > > > fashion (if the first command is just the wrong length). So, what to > > > do about all this? > > > > > > We could revert the commit mentioned above (and implement its > > > functionality another way). This would certainly resolve the ESPIPE > > > issue, the buffered input would now only be on the input stream, the > > > output stream would have no buffered input, and so glibc would never > > > try to lseek, and so we'd never get the ESPIPE error. > > > > > > However, this only solves one of the two problems. We would still > > > suffer from the problem where, if the first command is just the wrong > > > length, the second command will not (immediately) get handled. > > > > > > The only solution I can see to this problem is to unbuffer the input > > > stream. If glibc is not buffering the input, but instead, we read > > > incoming data character by character from the kernel, then everything > > > will be fine. As soon as we see the newline at the end of the first > > > command we will handle the first command. As glibc will have no > > > buffered input it will not be tempted to lseek, so no ESPIPE error. > > > When we go have to the event loop there will be more data pending in > > > the kernel, so the select will immediately return, and the second > > > command will be processed. > > > > > > I'm tempted to suggest that we should move the unbuffering of the > > > input stream out of gdb_readline_no_editing_callback and do it > > > somewhere earlier, more like when we create the input streams. > > > However, I've not done that in this commit for a couple of reasons: > > > > > > 1. By keeping the unbuffering in gdb_readline_no_editing_callback > > > I'm making the smallest possible change that fixes the bug. Moving > > > the unbuffering somewhere better can be done as a refactor later, if > > > that 's felt to be important, > > > > > > 2. I don't think making repeated calls to unbuffer the input will > > > have that much performance impact. We only make the unbuffer call > > > once per call to gdb_readline_no_editing_callback, and, if the input > > > stream is already unbuffered we'll return pretty quickly, so I don't > > > see this as being massively costly, > > > > > > 3. Tom is currently doing lots of gdb stream management changes and > > > I want to minimise the chances we'll conflict. > > > > > > So, this commit just changes gdb_readline_no_editing_callback to > > > always unbuffer the input stream. > > > > > > The test for this issue sends two commands in a loop, with the first > > > command growing bigger each time around the loop. I actually make the > > > first command bigger by just adding whitespace to the front, as gdb > > > still has to read the complete command (including whitespace) via > > > glibc, so this is enough to trigger the bug. > > > > > > The original bug was reported when using a virtual machine, and in > > > this situation we see this in the strace output: > > > > > > read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64 > > > read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67 > > > > > > I'm not completely sure what's going on here, but it appears that the > > > kernel on the virtual machine is delivering the input to glibc slower > > > than I see on my real hardware; glibc asks for 1024 bytes, but only > > > gets 64 bytes the first time. In the second read we see the problem > > > case, the first character is the newline, but then the entire second > > > command is included. > > > > > > If I run this exact example on my real hardware then the first command > > > would not be truncated at 64 bytes, instead, I'd expect to see the > > > newline included in the first read, with the second command split into > > > a second read. > > > > > > So, for testing, I check cases where the first command is just a few > > > characters (starting at 8 character), all the way up to 2048 > > > characters. Hopefully, this should mean we hit the problem case for > > > most machine setups. > > > > > > The only last question relates to commit afe09f0b6311a4d that I > > > mentioned earlier. That commit was intended to provide support for > > > Microsoft named pipes: > > > > > > https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes > > > > > > I know next to nothing about this topic beyond a brief scan of the > > > above link, but I think these windows named pipe are closer in > > > behaviour to unix sockets than to unix named fifos. > > > > > > I am a little nervous that, after the above commit, we now use the > > > same FILE for in, err, and out streams. In contrast, in a vanilla C > > > program, I would expect different FILE objects for each stream. > > > > > > Still, I'm reluctant to revert the above commit (and provide the same > > > functionality a different way) without a specific bug to point at, > > > and, now that the streams are unbuffered, I expect a lot of the read > > > and write calls are going straight to the kernel with minimal glibc > > > involvement, so maybe it doesn't really matter. Anyway, I haven't > > > touched the above patch, but it is something to keep in mind when > > > working in this area. > > > > > > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711 > > > > The change looks ok to me (better than the status quo), given that > > correctness is more important than performance. > > > > I'm just wondering if there's a noticeable performance difference > > between having the input buffered vs unbuffered. Calling fgetc with > > unbuffered input means we do one syscall per character. With frontends > > sending tons of commands, it could possibly affect the responsiveness > > and degrade user experience. But it's just a guess, we should be able > > to measure it. > > > > How difficult would it be to just have different behaviors on Windows, > > opening the stream only once, vs Linux, opening three streams? I think it would be easy enough to implement new-ui differently for unix vs for windows. But, if I've left you with the impression that this is a possible solution, then I failed with my commit message. Even on Linux, when we're not using new-ui at all, we currently can loose commands unless we switch to unbuffered input. Another option; maybe there's some ioctl that can be used to tell the kernel that a newline at the start of a read request should be considered a line break. However, if such an ioctl exits, I've failed to find it. The final option that I considered is to place glibc into unbuffered mode, and then do the buffering ourselves within gdb. This obviously adds a load more complexity which is why I'd rather not do this. But, if we did do this, then at the end of gdb_readline_no_editing_callback we could set the flag call_stdin_event_handler_again_p, and things would "just work". I can take a look at gathering some performance figures for sure and will update once I have some data, but this patch is fixing an existing bug... Thanks, Andrew