From: Andrew Burgess <aburgess@redhat.com>
To: Simon Marchi <simon.marchi@polymtl.ca>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] gdb: unbuffer all input streams when not using readline
Date: Tue, 18 Jan 2022 17:09:13 +0000 [thread overview]
Message-ID: <20220118170913.GH622389@redhat.com> (raw)
In-Reply-To: <ebeb5097-057d-6681-bb6f-e2fadb776cab@polymtl.ca>
* Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> [2022-01-18 11:26:42 -0500]:
>
>
>
>
> On 2022-01-17 11:40, Andrew Burgess via Gdb-patches wrote:
>
> > This commit should fix PR gdb/28711. What's actually going on is
>
> > pretty involved, and there's still a bit of the story that I don't
>
> > understand completely, however, from my observed results, I think that
>
> > the change I propose making here (or something very similar) is going
>
> > to be needed.
>
> >
>
> > The original bug report involves using eclipse to drive gdb using mi
>
> > commands. A separate tty is spun off in which to send gdb the mi
>
> > commands, this tty is created using the new-ui command.
>
> >
>
> > The behaviour observed is that, given a particular set of mi commands
>
> > being sent to gdb, we sometimes see an ESPIPE error from a lseek
>
> > call, which ultimately results in gdb terminating.
>
> >
>
> > The problems all originate from gdb_readline_no_editing_callback in
>
> > gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
>
> > allow glibc to perform buffering on the FILE object being used.
>
> >
>
> > I say sometime, because, gdb_readline_no_editing_callback already
>
> > includes a call to disable the glibc buffering, but this is only done
>
> > if the input stream is not a tty. In our case the input stream is a
>
> > tty, so the buffering is left in place.
>
> >
>
> > The first step to understanding why this problem occurs is to
>
> > understand that eclipse sends multiple commands to gdb very quickly
>
> > without waiting for and answer to each command, eclipse plans to
>
> > collect all of the command results after sending all the commands to
>
> > gdb. In fact, eclipse sends the commands to gdb that they appear to
>
> > arrive in the gdb process as a single block of data. When reproducing
>
> > this issue within the testsuite I find it necessary to send multiple
>
> > commands using a single write call.
>
> >
>
> > The next bit of the story gets a little involved, and this is where my
>
> > understanding is not complete. I can describe the behaviour that I
>
> > observe, and (for me at least) I'm happy that what I'm seeing, if a
>
> > little strange, is consistent. In order to fully understand what's
>
> > going on I think I would likely need to dive into kernel code, which
>
> > currently seems unnecessary given that I'm happy with the solution I'm
>
> > proposing.
>
> >
>
> > The following description all relates to input from a tty in which I'm
>
> > not using readline. I see the same problems either when using a
>
> > new-ui tty, or with gdb's standard, non-readline, mi tty.
>
> >
>
> > Here's what I observe happening when I send multiple commands to gdb
>
> > using a single write, if I send gdb this:
>
> >
>
> > command_1\ncommand_2\ncommand_3
>
> >
>
> > then gdb's event loop will wake up (from its select) as it sees there
>
> > is input available. We call into gdb_readline_no_editing_callback,
>
> > where we call fgetc, glibc will do a single big read, and get back
>
> > just:
>
> >
>
> > command_1\n
>
> >
>
> > that is, despite there being multiple lines of input available, I
>
> > consistently get just a single line. From glibc a single character is
>
> > returned from the fgetc call, and within gdb we accumulate characters,
>
> > one at a time, into our own buffer. Eventually gdb sees the '\n'
>
> > character, and dispatches the whole 'command_1' into gdb's command
>
> > handler, which processes the command and prints the result. We then
>
> > return to gdb_readline_no_editing_callback, which in turn returns to
>
> > gdb's event loop where we re-enter the select.
>
> >
>
> > Inside the select we immediately see that there is more input waiting
>
> > on the input stream, drop out of the select, and call back into
>
> > gdb_readline_no_editing_callback. In this function we again call
>
> > fgetc where glibc performs another big read. This time glibc gets:
>
> >
>
> > command_2\n
>
> >
>
> > that is, we once again get just a single line, despite there being a
>
> > third line available. Just like the first command we copy the whole
>
> > string, character by character into gdb's buffer, then handle the
>
> > command. After handling the command we go to the event loop, enter,
>
> > and then exit the select, and call back to the function
>
> > gdb_readline_no_editing_callback.
>
> >
>
> > In gdb_readline_no_editing_callback we again call fgetc, this time
>
> > glibc gets the string:
>
> >
>
> > command_3\n
>
> >
>
> > like before, we copy this to gdb's buffer and handle the command, then
>
> > we return to the event loop. At this point the select blocks while we
>
> > wait for more input to arrive.
>
> >
>
> > The important bit of this is that someone, somewhere is, it appears,
>
> > taking care to split the incoming write into lines.
>
> >
>
> > My next experiment is to try something like:
>
> >
>
> > this_is_a_very_long_command\nshort_command\n
>
> >
>
> > However, I actually make 'this_is_a_very_long_command' very long, as
>
> > in many hundreds of characters long. One way to do this is:
>
> >
>
> > echo xxxxxx.....xxxxx
>
> >
>
> > and just adding more and more 'x' characters as needed. What I'm
>
> > aiming for is to have the first command be longer than glibc's
>
> > internal read buffer, which, on my machine, is 1024 characters.
>
> >
>
> > However, for this discussion, lets imagine that glibc's buffer is just
>
> > 8 characters (we can create just this situation by adding a suitable
>
> > setbuf call into gdb_readline_no_editing_callback).
>
> >
>
> > Now, if I send gdb this data:
>
> >
>
> > abcdefghij\nkl\n
>
> >
>
> > The first read from glibc will get 'abcdefgh', that is a full 8
>
> > character buffer. Once gdb has copied these to its buffer we call
>
> > fgetc again, and now glibc will get 'ij\n', that is, just like before,
>
> > multiple lines are split at the '\n' character. The full command,
>
> > which is now in gdb's buffer can be handled 'abcdefghij', after which
>
> > we go (via the event loop) back to gdb_readline_no_editing_callback.
>
> > Now we call fgetc, and glibc will get 'kl\n', which is then handled in
>
> > the normal way.
>
> >
>
> > So far, so good. However, there is, apparently, one edge case where
>
> > the above rules don't apply.
>
> >
>
> > If the '\n' character is the first character read from the kernel,
>
> > then the incoming lines are not split up. So, given glibc's 8
>
> > character buffer, if I send gdb this:
>
> >
>
> > abcdefgh\nkl\n
>
> >
>
> > that is the first command is 8 characters plus a newline, then, on the
>
> > first read (from within glibc) we get 'abcdefgh' in a single buffer.
>
> > As there's no newline gdb calls fgetc again, and glibc does another
>
> > large read, now we get:
>
> >
>
> > \nkl\n
>
> >
>
> > which doesn't follow the above pattern - the lines are not split into
>
> > separate buffers!
>
> >
>
> > So, gdb reads the first character from glibc using fgetc, this is the
>
> > newline. Now gdb has a complete command, and so the command is
>
> > handled. We then return to the event loop and enter the select.
>
> >
>
> > The problem is that, as far as the kernel is concerned, there is no
>
> > more input pending, it's all been read into glibc's buffer, and so the
>
> > select doesn't return. The second command is basically stuck in
>
> > glibc's buffer.
>
> >
>
> > If I send another command to gdb, or even just send an empty
>
> > command (a lone newline) then the select returns, we call into
>
> > gdb_readline_no_editing_callback, and now gdb sees the second
>
> > command.
>
> >
>
> > OK, so the above is interesting, but it doesn't explain the ESPIPE
>
> > error.
>
> >
>
> > Well, that's a slightly different, but related issue. The ESPIPE
>
> > case will _only_ show up when using new-ui to create the separate tty
>
> > for mi commands, and is a consequence of this commit:
>
> >
>
> > commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
>
> > Date: Thu Jul 18 17:20:04 2019 +0100
>
> >
>
> > Fix for using named pipes on Windows
>
> >
>
> > Prior to this commit, the new-ui command would open the tty three
>
> > times, once each for stdin, stderr, and stdout. After this commit we
>
> > open the tty just once and reuse the FILE object for all three roles.
>
> >
>
> > Consider the problem case, where glibc has (unexpectedly) read the
>
> > second command into its internal buffer. When we handle the first
>
> > command we usually end up having to write something to the mi output
>
> > stream.
>
> >
>
> > After the above commit the same FILE object represents both the input
>
> > and output streams, so, when gdb tries to write to the FILE object,
>
> > glibc spots that there is input pending within the input buffer, and
>
> > so assumes that we have read ahead of where we should be in the input
>
> > file. To correct for this glibc tries to do an lseek call to
>
> > reposition the file offset of the output stream prior to writing to
>
> > it. However, as the output stream is a tty, and seeking is not
>
> > supported on a tty, this lseek call fails, this results in the ESPIPE,
>
> > which ultimately causes gdb to terminate.
>
> >
>
> > So, now we understand why the ESPIPE triggers (which was what caused
>
> > the gdb crash in the original bug report), and we also understand that
>
> > sometime gdb will not handle the second command in a timely
>
> > fashion (if the first command is just the wrong length). So, what to
>
> > do about all this?
>
> >
>
> > We could revert the commit mentioned above (and implement its
>
> > functionality another way). This would certainly resolve the ESPIPE
>
> > issue, the buffered input would now only be on the input stream, the
>
> > output stream would have no buffered input, and so glibc would never
>
> > try to lseek, and so we'd never get the ESPIPE error.
>
> >
>
> > However, this only solves one of the two problems. We would still
>
> > suffer from the problem where, if the first command is just the wrong
>
> > length, the second command will not (immediately) get handled.
>
> >
>
> > The only solution I can see to this problem is to unbuffer the input
>
> > stream. If glibc is not buffering the input, but instead, we read
>
> > incoming data character by character from the kernel, then everything
>
> > will be fine. As soon as we see the newline at the end of the first
>
> > command we will handle the first command. As glibc will have no
>
> > buffered input it will not be tempted to lseek, so no ESPIPE error.
>
> > When we go have to the event loop there will be more data pending in
>
> > the kernel, so the select will immediately return, and the second
>
> > command will be processed.
>
> >
>
> > I'm tempted to suggest that we should move the unbuffering of the
>
> > input stream out of gdb_readline_no_editing_callback and do it
>
> > somewhere earlier, more like when we create the input streams.
>
> > However, I've not done that in this commit for a couple of reasons:
>
> >
>
> > 1. By keeping the unbuffering in gdb_readline_no_editing_callback
>
> > I'm making the smallest possible change that fixes the bug. Moving
>
> > the unbuffering somewhere better can be done as a refactor later, if
>
> > that 's felt to be important,
>
> >
>
> > 2. I don't think making repeated calls to unbuffer the input will
>
> > have that much performance impact. We only make the unbuffer call
>
> > once per call to gdb_readline_no_editing_callback, and, if the input
>
> > stream is already unbuffered we'll return pretty quickly, so I don't
>
> > see this as being massively costly,
>
> >
>
> > 3. Tom is currently doing lots of gdb stream management changes and
>
> > I want to minimise the chances we'll conflict.
>
> >
>
> > So, this commit just changes gdb_readline_no_editing_callback to
>
> > always unbuffer the input stream.
>
> >
>
> > The test for this issue sends two commands in a loop, with the first
>
> > command growing bigger each time around the loop. I actually make the
>
> > first command bigger by just adding whitespace to the front, as gdb
>
> > still has to read the complete command (including whitespace) via
>
> > glibc, so this is enough to trigger the bug.
>
> >
>
> > The original bug was reported when using a virtual machine, and in
>
> > this situation we see this in the strace output:
>
> >
>
> > read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
>
> > read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
>
> >
>
> > I'm not completely sure what's going on here, but it appears that the
>
> > kernel on the virtual machine is delivering the input to glibc slower
>
> > than I see on my real hardware; glibc asks for 1024 bytes, but only
>
> > gets 64 bytes the first time. In the second read we see the problem
>
> > case, the first character is the newline, but then the entire second
>
> > command is included.
>
> >
>
> > If I run this exact example on my real hardware then the first command
>
> > would not be truncated at 64 bytes, instead, I'd expect to see the
>
> > newline included in the first read, with the second command split into
>
> > a second read.
>
> >
>
> > So, for testing, I check cases where the first command is just a few
>
> > characters (starting at 8 character), all the way up to 2048
>
> > characters. Hopefully, this should mean we hit the problem case for
>
> > most machine setups.
>
> >
>
> > The only last question relates to commit afe09f0b6311a4d that I
>
> > mentioned earlier. That commit was intended to provide support for
>
> > Microsoft named pipes:
>
> >
>
> > https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
>
> >
>
> > I know next to nothing about this topic beyond a brief scan of the
>
> > above link, but I think these windows named pipe are closer in
>
> > behaviour to unix sockets than to unix named fifos.
>
> >
>
> > I am a little nervous that, after the above commit, we now use the
>
> > same FILE for in, err, and out streams. In contrast, in a vanilla C
>
> > program, I would expect different FILE objects for each stream.
>
> >
>
> > Still, I'm reluctant to revert the above commit (and provide the same
>
> > functionality a different way) without a specific bug to point at,
>
> > and, now that the streams are unbuffered, I expect a lot of the read
>
> > and write calls are going straight to the kernel with minimal glibc
>
> > involvement, so maybe it doesn't really matter. Anyway, I haven't
>
> > touched the above patch, but it is something to keep in mind when
>
> > working in this area.
>
> >
>
> > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
>
>
>
> The change looks ok to me (better than the status quo), given that
>
> correctness is more important than performance.
>
>
>
> I'm just wondering if there's a noticeable performance difference
>
> between having the input buffered vs unbuffered. Calling fgetc with
>
> unbuffered input means we do one syscall per character. With frontends
>
> sending tons of commands, it could possibly affect the responsiveness
>
> and degrade user experience. But it's just a guess, we should be able
>
> to measure it.
>
>
>
> How difficult would it be to just have different behaviors on Windows,
>
> opening the stream only once, vs Linux, opening three streams?
I think it would be easy enough to implement new-ui differently for
unix vs for windows.
But, if I've left you with the impression that this is a possible
solution, then I failed with my commit message. Even on Linux, when
we're not using new-ui at all, we currently can loose commands unless
we switch to unbuffered input.
Another option; maybe there's some ioctl that can be used to tell the
kernel that a newline at the start of a read request should be
considered a line break. However, if such an ioctl exits, I've failed
to find it.
The final option that I considered is to place glibc into unbuffered
mode, and then do the buffering ourselves within gdb. This obviously
adds a load more complexity which is why I'd rather not do this.
But, if we did do this, then at the end of
gdb_readline_no_editing_callback we could set the flag
call_stdin_event_handler_again_p, and things would "just work".
I can take a look at gathering some performance figures for sure and
will update once I have some data, but this patch is fixing an
existing bug...
Thanks,
Andrew
next prev parent reply other threads:[~2022-01-18 17:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-17 16:40 Andrew Burgess
2022-01-18 16:26 ` Simon Marchi
2022-01-18 17:09 ` Andrew Burgess [this message]
2022-01-18 17:57 ` Simon Marchi
2022-01-18 18:09 ` Andrew Burgess
2022-01-18 18:59 ` Tom Tromey
2022-02-02 16:23 ` Andrew Burgess
2022-02-07 10:27 ` Andrew Burgess
2022-01-18 18:52 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220118170913.GH622389@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=simon.marchi@polymtl.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).