From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BEF363858417; Mon, 7 Feb 2022 10:25:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BEF363858417 From: "cvs-commit at gcc dot gnu.org" To: gdb-prs@sourceware.org Subject: [Bug mi/28711] gdb closes when displaying structs with long field names in eclipse Date: Mon, 07 Feb 2022 10:25:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: mi X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gdb-prs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-prs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Feb 2022 10:25:07 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D28711 --- Comment #9 from cvs-commit at gcc dot gnu.org --- The master branch has been updated by Andrew Burgess : https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3Dd08cbc5d3203= 118da5583296e49273cf82378042 commit d08cbc5d3203118da5583296e49273cf82378042 Author: Andrew Burgess Date: Wed Dec 22 12:57:44 2021 +0000 gdb: unbuffer all input streams when not using readline This commit should fix PR gdb/28711. What's actually going on is pretty involved, and there's still a bit of the story that I don't understand completely, however, from my observed results, I think that the change I propose making here (or something very similar) is going to be needed. The original bug report involves using eclipse to drive gdb using mi commands. A separate tty is spun off in which to send gdb the mi commands, this tty is created using the new-ui command. The behaviour observed is that, given a particular set of mi commands being sent to gdb, we sometimes see an ESPIPE error from a lseek call, which ultimately results in gdb terminating. The problems all originate from gdb_readline_no_editing_callback in gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and allow glibc to perform buffering on the FILE object being used. I say sometime, because, gdb_readline_no_editing_callback already includes a call to disable the glibc buffering, but this is only done if the input stream is not a tty. In our case the input stream is a tty, so the buffering is left in place. The first step to understanding why this problem occurs is to understand that eclipse sends multiple commands to gdb very quickly without waiting for and answer to each command, eclipse plans to collect all of the command results after sending all the commands to gdb. In fact, eclipse sends the commands to gdb that they appear to arrive in the gdb process as a single block of data. When reproducing this issue within the testsuite I find it necessary to send multiple commands using a single write call. The next bit of the story gets a little involved, and this is where my understanding is not complete. I can describe the behaviour that I observe, and (for me at least) I'm happy that what I'm seeing, if a little strange, is consistent. In order to fully understand what's going on I think I would likely need to dive into kernel code, which currently seems unnecessary given that I'm happy with the solution I'm proposing. The following description all relates to input from a tty in which I'm not using readline. I see the same problems either when using a new-ui tty, or with gdb's standard, non-readline, mi tty. Here's what I observe happening when I send multiple commands to gdb using a single write, if I send gdb this: command_1\ncommand_2\ncommand_3 then gdb's event loop will wake up (from its select) as it sees there is input available. We call into gdb_readline_no_editing_callback, where we call fgetc, glibc will do a single big read, and get back just: command_1\n that is, despite there being multiple lines of input available, I consistently get just a single line. From glibc a single character is returned from the fgetc call, and within gdb we accumulate characters, one at a time, into our own buffer. Eventually gdb sees the '\n' character, and dispatches the whole 'command_1' into gdb's command handler, which processes the command and prints the result. We then return to gdb_readline_no_editing_callback, which in turn returns to gdb's event loop where we re-enter the select. Inside the select we immediately see that there is more input waiting on the input stream, drop out of the select, and call back into gdb_readline_no_editing_callback. In this function we again call fgetc where glibc performs another big read. This time glibc gets: command_2\n that is, we once again get just a single line, despite there being a third line available. Just like the first command we copy the whole string, character by character into gdb's buffer, then handle the command. After handling the command we go to the event loop, enter, and then exit the select, and call back to the function gdb_readline_no_editing_callback. In gdb_readline_no_editing_callback we again call fgetc, this time glibc gets the string: command_3\n like before, we copy this to gdb's buffer and handle the command, then we return to the event loop. At this point the select blocks while we wait for more input to arrive. The important bit of this is that someone, somewhere is, it appears, taking care to split the incoming write into lines. My next experiment is to try something like: this_is_a_very_long_command\nshort_command\n However, I actually make 'this_is_a_very_long_command' very long, as in many hundreds of characters long. One way to do this is: echo xxxxxx.....xxxxx and just adding more and more 'x' characters as needed. What I'm aiming for is to have the first command be longer than glibc's internal read buffer, which, on my machine, is 1024 characters. However, for this discussion, lets imagine that glibc's buffer is just 8 characters (we can create just this situation by adding a suitable setbuf call into gdb_readline_no_editing_callback). Now, if I send gdb this data: abcdefghij\nkl\n The first read from glibc will get 'abcdefgh', that is a full 8 character buffer. Once gdb has copied these to its buffer we call fgetc again, and now glibc will get 'ij\n', that is, just like before, multiple lines are split at the '\n' character. The full command, which is now in gdb's buffer can be handled 'abcdefghij', after which we go (via the event loop) back to gdb_readline_no_editing_callback. Now we call fgetc, and glibc will get 'kl\n', which is then handled in the normal way. So far, so good. However, there is, apparently, one edge case where the above rules don't apply. If the '\n' character is the first character read from the kernel, then the incoming lines are not split up. So, given glibc's 8 character buffer, if I send gdb this: abcdefgh\nkl\n that is the first command is 8 characters plus a newline, then, on the first read (from within glibc) we get 'abcdefgh' in a single buffer. As there's no newline gdb calls fgetc again, and glibc does another large read, now we get: \nkl\n which doesn't follow the above pattern - the lines are not split into separate buffers! So, gdb reads the first character from glibc using fgetc, this is the newline. Now gdb has a complete command, and so the command is handled. We then return to the event loop and enter the select. The problem is that, as far as the kernel is concerned, there is no more input pending, it's all been read into glibc's buffer, and so the select doesn't return. The second command is basically stuck in glibc's buffer. If I send another command to gdb, or even just send an empty command (a lone newline) then the select returns, we call into gdb_readline_no_editing_callback, and now gdb sees the second command. OK, so the above is interesting, but it doesn't explain the ESPIPE error. Well, that's a slightly different, but related issue. The ESPIPE case will _only_ show up when using new-ui to create the separate tty for mi commands, and is a consequence of this commit: commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8 Date: Thu Jul 18 17:20:04 2019 +0100 Fix for using named pipes on Windows Prior to this commit, the new-ui command would open the tty three times, once each for stdin, stderr, and stdout. After this commit we open the tty just once and reuse the FILE object for all three roles. Consider the problem case, where glibc has (unexpectedly) read the second command into its internal buffer. When we handle the first command we usually end up having to write something to the mi output stream. After the above commit the same FILE object represents both the input and output streams, so, when gdb tries to write to the FILE object, glibc spots that there is input pending within the input buffer, and so assumes that we have read ahead of where we should be in the input file. To correct for this glibc tries to do an lseek call to reposition the file offset of the output stream prior to writing to it. However, as the output stream is a tty, and seeking is not supported on a tty, this lseek call fails, this results in the ESPIPE, which ultimately causes gdb to terminate. So, now we understand why the ESPIPE triggers (which was what caused the gdb crash in the original bug report), and we also understand that sometime gdb will not handle the second command in a timely fashion (if the first command is just the wrong length). So, what to do about all this? We could revert the commit mentioned above (and implement its functionality another way). This would certainly resolve the ESPIPE issue, the buffered input would now only be on the input stream, the output stream would have no buffered input, and so glibc would never try to lseek, and so we'd never get the ESPIPE error. However, this only solves one of the two problems. We would still suffer from the problem where, if the first command is just the wrong length, the second command will not (immediately) get handled. The only solution I can see to this problem is to unbuffer the input stream. If glibc is not buffering the input, but instead, we read incoming data character by character from the kernel, then everything will be fine. As soon as we see the newline at the end of the first command we will handle the first command. As glibc will have no buffered input it will not be tempted to lseek, so no ESPIPE error. When we go have to the event loop there will be more data pending in the kernel, so the select will immediately return, and the second command will be processed. I'm tempted to suggest that we should move the unbuffering of the input stream out of gdb_readline_no_editing_callback and do it somewhere earlier, more like when we create the input streams. However, I've not done that in this commit for a couple of reasons: 1. By keeping the unbuffering in gdb_readline_no_editing_callback I'm making the smallest possible change that fixes the bug. Moving the unbuffering somewhere better can be done as a refactor later, if that 's felt to be important, 2. I don't think making repeated calls to unbuffer the input will have that much performance impact. We only make the unbuffer call once per call to gdb_readline_no_editing_callback, and, if the input stream is already unbuffered we'll return pretty quickly, so I don't see this as being massively costly, 3. Tom is currently doing lots of gdb stream management changes and I want to minimise the chances we'll conflict. So, this commit just changes gdb_readline_no_editing_callback to always unbuffer the input stream. The test for this issue sends two commands in a loop, with the first command growing bigger each time around the loop. I actually make the first command bigger by just adding whitespace to the front, as gdb still has to read the complete command (including whitespace) via glibc, so this is enough to trigger the bug. The original bug was reported when using a virtual machine, and in this situation we see this in the strace output: read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) =3D 64 read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) =3D 67 I'm not completely sure what's going on here, but it appears that the kernel on the virtual machine is delivering the input to glibc slower than I see on my real hardware; glibc asks for 1024 bytes, but only gets 64 bytes the first time. In the second read we see the problem case, the first character is the newline, but then the entire second command is included. If I run this exact example on my real hardware then the first command would not be truncated at 64 bytes, instead, I'd expect to see the newline included in the first read, with the second command split into a second read. So, for testing, I check cases where the first command is just a few characters (starting at 8 character), all the way up to 2048 characters. Hopefully, this should mean we hit the problem case for most machine setups. The only last question relates to commit afe09f0b6311a4d that I mentioned earlier. That commit was intended to provide support for Microsoft named pipes: https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes I know next to nothing about this topic beyond a brief scan of the above link, but I think these windows named pipe are closer in behaviour to unix sockets than to unix named fifos. I am a little nervous that, after the above commit, we now use the same FILE for in, err, and out streams. In contrast, in a vanilla C program, I would expect different FILE objects for each stream. Still, I'm reluctant to revert the above commit (and provide the same functionality a different way) without a specific bug to point at, and, now that the streams are unbuffered, I expect a lot of the read and write calls are going straight to the kernel with minimal glibc involvement, so maybe it doesn't really matter. Anyway, I haven't touched the above patch, but it is something to keep in mind when working in this area. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=3D28711 --=20 You are receiving this mail because: You are on the CC list for the bug.=