From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aburgess@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id DAAF73858D39
 for <gdb-patches@sourceware.org>; Tue, 18 Jan 2022 17:09:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DAAF73858D39
Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com
 [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-637-y7UKVVWEMKWHx607kXNW-w-1; Tue, 18 Jan 2022 12:09:17 -0500
X-MC-Unique: y7UKVVWEMKWHx607kXNW-w-1
Received: by mail-wm1-f72.google.com with SMTP id
 n25-20020a05600c3b9900b00348b83fbd0dso2526769wms.0
 for <gdb-patches@sourceware.org>; Tue, 18 Jan 2022 09:09:17 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=uE92G7nnUwY/Ere9ySmYCrXOIuZKm5tcyXQ5qPy6FzY=;
 b=iofK2DZgmUOQX+JfecYZFSOpPCkon6R1z/+2B7w9Vfw8WBJsJWS4NsIZDVPCdO7SMk
 QTFLAVChxcz/zlMTUauUvAumB/vv7EJgdXE7zGm4W1hzBHadr7SwSNqQH9dOnClUiPJc
 FtSHr6Tn0SfzL/Cx3AE8pXrpa1YFgVIGkjSh0cVk5P/rjQlW5j3wRBFYGBkBnu8AbPi6
 3GO3lWBNIUEdHYMSK95RB11Rlx1K3M0g8+vo128Q5C+3/R4xrMd7jSJsriKFZ+cXm2H1
 mS2rA5JPeQCHlwjfTK+WyBN43fw6zOGPgy3CbP5hRv85qSNu2Z/NCg19pXyg5YzBnhGY
 oyIw==
X-Gm-Message-State: AOAM532Ej/KVDXCEt3XGxuiKL0ybgu619niHqbHxRiGdjiSVXiFISum+
 mwf6+5xWKoYG5BMx6XUgyqYexH1Z6UL74yPs5B/Y51b6yx8lc8KWw/kl8Xrh3Id1VP845TKOyRG
 s0YbRUkrcTN2mxxBSCJJafQ==
X-Received: by 2002:a05:6000:15c9:: with SMTP id
 y9mr23749940wry.689.1642525755599; 
 Tue, 18 Jan 2022 09:09:15 -0800 (PST)
X-Google-Smtp-Source: ABdhPJxYtpxu5t6AgipIS7Aw2j6BGkdoFgUVJj9QSbZpal2EpJyQHc+JhI+BAJN3cqzPgdrcvTWu5Q==
X-Received: by 2002:a05:6000:15c9:: with SMTP id
 y9mr23749912wry.689.1642525755156; 
 Tue, 18 Jan 2022 09:09:15 -0800 (PST)
Received: from localhost (host86-188-49-82.range86-188.btcentralplus.com.
 [86.188.49.82])
 by smtp.gmail.com with ESMTPSA id e10sm20703501wrq.40.2022.01.18.09.09.14
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 18 Jan 2022 09:09:14 -0800 (PST)
Date: Tue, 18 Jan 2022 17:09:13 +0000
From: Andrew Burgess <aburgess@redhat.com>
To: Simon Marchi <simon.marchi@polymtl.ca>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] gdb: unbuffer all input streams when not using readline
Message-ID: <20220118170913.GH622389@redhat.com>
References: <20220117164051.1854133-1-aburgess@redhat.com>
 <ebeb5097-057d-6681-bb6f-e2fadb776cab@polymtl.ca>
MIME-Version: 1.0
In-Reply-To: <ebeb5097-057d-6681-bb6f-e2fadb776cab@polymtl.ca>
X-Operating-System: Linux/5.8.18-100.fc31.x86_64 (x86_64)
X-Uptime: 16:48:45 up 17 days,  1:42,
X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ]
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jan 2022 17:09:26 -0000

* Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> [2022-01-18 11:26:42 -0500]:

> 
> 
> 
> 
> On 2022-01-17 11:40, Andrew Burgess via Gdb-patches wrote:
> 
> > This commit should fix PR gdb/28711.  What's actually going on is
> 
> > pretty involved, and there's still a bit of the story that I don't
> 
> > understand completely, however, from my observed results, I think that
> 
> > the change I propose making here (or something very similar) is going
> 
> > to be needed.
> 
> > 
> 
> > The original bug report involves using eclipse to drive gdb using mi
> 
> > commands.  A separate tty is spun off in which to send gdb the mi
> 
> > commands, this tty is created using the new-ui command.
> 
> > 
> 
> > The behaviour observed is that, given a particular set of mi commands
> 
> > being sent to gdb, we sometimes see an ESPIPE error from a lseek
> 
> > call, which ultimately results in gdb terminating.
> 
> > 
> 
> > The problems all originate from gdb_readline_no_editing_callback in
> 
> > gdb/event-top.c, where we can (sometimes) perform calls to fgetc, and
> 
> > allow glibc to perform buffering on the FILE object being used.
> 
> > 
> 
> > I say sometime, because, gdb_readline_no_editing_callback already
> 
> > includes a call to disable the glibc buffering, but this is only done
> 
> > if the input stream is not a tty.  In our case the input stream is a
> 
> > tty, so the buffering is left in place.
> 
> > 
> 
> > The first step to understanding why this problem occurs is to
> 
> > understand that eclipse sends multiple commands to gdb very quickly
> 
> > without waiting for and answer to each command, eclipse plans to
> 
> > collect all of the command results after sending all the commands to
> 
> > gdb.  In fact, eclipse sends the commands to gdb that they appear to
> 
> > arrive in the gdb process as a single block of data.  When reproducing
> 
> > this issue within the testsuite I find it necessary to send multiple
> 
> > commands using a single write call.
> 
> > 
> 
> > The next bit of the story gets a little involved, and this is where my
> 
> > understanding is not complete.  I can describe the behaviour that I
> 
> > observe, and (for me at least) I'm happy that what I'm seeing, if a
> 
> > little strange, is consistent.  In order to fully understand what's
> 
> > going on I think I would likely need to dive into kernel code, which
> 
> > currently seems unnecessary given that I'm happy with the solution I'm
> 
> > proposing.
> 
> > 
> 
> > The following description all relates to input from a tty in which I'm
> 
> > not using readline.  I see the same problems either when using a
> 
> > new-ui tty, or with gdb's standard, non-readline, mi tty.
> 
> > 
> 
> > Here's what I observe happening when I send multiple commands to gdb
> 
> > using a single write, if I send gdb this:
> 
> > 
> 
> >   command_1\ncommand_2\ncommand_3
> 
> > 
> 
> > then gdb's event loop will wake up (from its select) as it sees there
> 
> > is input available.  We call into gdb_readline_no_editing_callback,
> 
> > where we call fgetc, glibc will do a single big read, and get back
> 
> > just:
> 
> > 
> 
> >   command_1\n
> 
> > 
> 
> > that is, despite there being multiple lines of input available, I
> 
> > consistently get just a single line.  From glibc a single character is
> 
> > returned from the fgetc call, and within gdb we accumulate characters,
> 
> > one at a time, into our own buffer.  Eventually gdb sees the '\n'
> 
> > character, and dispatches the whole 'command_1' into gdb's command
> 
> > handler, which processes the command and prints the result.  We then
> 
> > return to gdb_readline_no_editing_callback, which in turn returns to
> 
> > gdb's event loop where we re-enter the select.
> 
> > 
> 
> > Inside the select we immediately see that there is more input waiting
> 
> > on the input stream, drop out of the select, and call back into
> 
> > gdb_readline_no_editing_callback.  In this function we again call
> 
> > fgetc where glibc performs another big read.  This time glibc gets:
> 
> > 
> 
> >   command_2\n
> 
> > 
> 
> > that is, we once again get just a single line, despite there being a
> 
> > third line available.  Just like the first command we copy the whole
> 
> > string, character by character into gdb's buffer, then handle the
> 
> > command.  After handling the command we go to the event loop, enter,
> 
> > and then exit the select, and call back to the function
> 
> > gdb_readline_no_editing_callback.
> 
> > 
> 
> > In gdb_readline_no_editing_callback we again call fgetc, this time
> 
> > glibc gets the string:
> 
> > 
> 
> >   command_3\n
> 
> > 
> 
> > like before, we copy this to gdb's buffer and handle the command, then
> 
> > we return to the event loop.  At this point the select blocks while we
> 
> > wait for more input to arrive.
> 
> > 
> 
> > The important bit of this is that someone, somewhere is, it appears,
> 
> > taking care to split the incoming write into lines.
> 
> > 
> 
> > My next experiment is to try something like:
> 
> > 
> 
> >   this_is_a_very_long_command\nshort_command\n
> 
> > 
> 
> > However, I actually make 'this_is_a_very_long_command' very long, as
> 
> > in many hundreds of characters long.  One way to do this is:
> 
> > 
> 
> >   echo xxxxxx.....xxxxx
> 
> > 
> 
> > and just adding more and more 'x' characters as needed.  What I'm
> 
> > aiming for is to have the first command be longer than glibc's
> 
> > internal read buffer, which, on my machine, is 1024 characters.
> 
> > 
> 
> > However, for this discussion, lets imagine that glibc's buffer is just
> 
> > 8 characters (we can create just this situation by adding a suitable
> 
> > setbuf call into gdb_readline_no_editing_callback).
> 
> > 
> 
> > Now, if I send gdb this data:
> 
> > 
> 
> >   abcdefghij\nkl\n
> 
> > 
> 
> > The first read from glibc will get 'abcdefgh', that is a full 8
> 
> > character buffer.  Once gdb has copied these to its buffer we call
> 
> > fgetc again, and now glibc will get 'ij\n', that is, just like before,
> 
> > multiple lines are split at the '\n' character.  The full command,
> 
> > which is now in gdb's buffer can be handled 'abcdefghij', after which
> 
> > we go (via the event loop) back to gdb_readline_no_editing_callback.
> 
> > Now we call fgetc, and glibc will get 'kl\n', which is then handled in
> 
> > the normal way.
> 
> > 
> 
> > So far, so good.  However, there is, apparently, one edge case where
> 
> > the above rules don't apply.
> 
> > 
> 
> > If the '\n' character is the first character read from the kernel,
> 
> > then the incoming lines are not split up.  So, given glibc's 8
> 
> > character buffer, if I send gdb this:
> 
> > 
> 
> >   abcdefgh\nkl\n
> 
> > 
> 
> > that is the first command is 8 characters plus a newline, then, on the
> 
> > first read (from within glibc) we get 'abcdefgh' in a single buffer.
> 
> > As there's no newline gdb calls fgetc again, and glibc does another
> 
> > large read, now we get:
> 
> > 
> 
> >   \nkl\n
> 
> > 
> 
> > which doesn't follow the above pattern - the lines are not split into
> 
> > separate buffers!
> 
> > 
> 
> > So, gdb reads the first character from glibc using fgetc, this is the
> 
> > newline.  Now gdb has a complete command, and so the command is
> 
> > handled.  We then return to the event loop and enter the select.
> 
> > 
> 
> > The problem is that, as far as the kernel is concerned, there is no
> 
> > more input pending, it's all been read into glibc's buffer, and so the
> 
> > select doesn't return.  The second command is basically stuck in
> 
> > glibc's buffer.
> 
> > 
> 
> > If I send another command to gdb, or even just send an empty
> 
> > command (a lone newline) then the select returns, we call into
> 
> > gdb_readline_no_editing_callback, and now gdb sees the second
> 
> > command.
> 
> > 
> 
> > OK, so the above is interesting, but it doesn't explain the ESPIPE
> 
> > error.
> 
> > 
> 
> > Well, that's a slightly different, but related issue.  The ESPIPE
> 
> > case will _only_ show up when using new-ui to create the separate tty
> 
> > for mi commands, and is a consequence of this commit:
> 
> > 
> 
> >   commit afe09f0b6311a4dd1a7e2dc6491550bb228734f8
> 
> >   Date:   Thu Jul 18 17:20:04 2019 +0100
> 
> > 
> 
> >       Fix for using named pipes on Windows
> 
> > 
> 
> > Prior to this commit, the new-ui command would open the tty three
> 
> > times, once each for stdin, stderr, and stdout.  After this commit we
> 
> > open the tty just once and reuse the FILE object for all three roles.
> 
> > 
> 
> > Consider the problem case, where glibc has (unexpectedly) read the
> 
> > second command into its internal buffer.  When we handle the first
> 
> > command we usually end up having to write something to the mi output
> 
> > stream.
> 
> > 
> 
> > After the above commit the same FILE object represents both the input
> 
> > and output streams, so, when gdb tries to write to the FILE object,
> 
> > glibc spots that there is input pending within the input buffer, and
> 
> > so assumes that we have read ahead of where we should be in the input
> 
> > file.  To correct for this glibc tries to do an lseek call to
> 
> > reposition the file offset of the output stream prior to writing to
> 
> > it.  However, as the output stream is a tty, and seeking is not
> 
> > supported on a tty, this lseek call fails, this results in the ESPIPE,
> 
> > which ultimately causes gdb to terminate.
> 
> > 
> 
> > So, now we understand why the ESPIPE triggers (which was what caused
> 
> > the gdb crash in the original bug report), and we also understand that
> 
> > sometime gdb will not handle the second command in a timely
> 
> > fashion (if the first command is just the wrong length). So, what to
> 
> > do about all this?
> 
> > 
> 
> > We could revert the commit mentioned above (and implement its
> 
> > functionality another way).  This would certainly resolve the ESPIPE
> 
> > issue, the buffered input would now only be on the input stream, the
> 
> > output stream would have no buffered input, and so glibc would never
> 
> > try to lseek, and so we'd never get the ESPIPE error.
> 
> > 
> 
> > However, this only solves one of the two problems.  We would still
> 
> > suffer from the problem where, if the first command is just the wrong
> 
> > length, the second command will not (immediately) get handled.
> 
> > 
> 
> > The only solution I can see to this problem is to unbuffer the input
> 
> > stream.  If glibc is not buffering the input, but instead, we read
> 
> > incoming data character by character from the kernel, then everything
> 
> > will be fine.  As soon as we see the newline at the end of the first
> 
> > command we will handle the first command.  As glibc will have no
> 
> > buffered input it will not be tempted to lseek, so no ESPIPE error.
> 
> > When we go have to the event loop there will be more data pending in
> 
> > the kernel, so the select will immediately return, and the second
> 
> > command will be processed.
> 
> > 
> 
> > I'm tempted to suggest that we should move the unbuffering of the
> 
> > input stream out of gdb_readline_no_editing_callback and do it
> 
> > somewhere earlier, more like when we create the input streams.
> 
> > However, I've not done that in this commit for a couple of reasons:
> 
> > 
> 
> >   1. By keeping the unbuffering in gdb_readline_no_editing_callback
> 
> >   I'm making the smallest possible change that fixes the bug.  Moving
> 
> >   the unbuffering somewhere better can be done as a refactor later, if
> 
> >   that 's felt to be important,
> 
> > 
> 
> >   2. I don't think making repeated calls to unbuffer the input will
> 
> >   have that much performance impact.  We only make the unbuffer call
> 
> >   once per call to gdb_readline_no_editing_callback, and, if the input
> 
> >   stream is already unbuffered we'll return pretty quickly, so I don't
> 
> >   see this as being massively costly,
> 
> > 
> 
> >   3. Tom is currently doing lots of gdb stream management changes and
> 
> >   I want to minimise the chances we'll conflict.
> 
> > 
> 
> > So, this commit just changes gdb_readline_no_editing_callback to
> 
> > always unbuffer the input stream.
> 
> > 
> 
> > The test for this issue sends two commands in a loop, with the first
> 
> > command growing bigger each time around the loop.  I actually make the
> 
> > first command bigger by just adding whitespace to the front, as gdb
> 
> > still has to read the complete command (including whitespace) via
> 
> > glibc, so this is enough to trigger the bug.
> 
> > 
> 
> > The original bug was reported when using a virtual machine, and in
> 
> > this situation we see this in the strace output:
> 
> > 
> 
> >   read(9, "70-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 1024) = 64
> 
> >   read(9, "\n71-var-info-path-expression var1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n", 1024) = 67
> 
> > 
> 
> > I'm not completely sure what's going on here, but it appears that the
> 
> > kernel on the virtual machine is delivering the input to glibc slower
> 
> > than I see on my real hardware; glibc asks for 1024 bytes, but only
> 
> > gets 64 bytes the first time.  In the second read we see the problem
> 
> > case, the first character is the newline, but then the entire second
> 
> > command is included.
> 
> > 
> 
> > If I run this exact example on my real hardware then the first command
> 
> > would not be truncated at 64 bytes, instead, I'd expect to see the
> 
> > newline included in the first read, with the second command split into
> 
> > a second read.
> 
> > 
> 
> > So, for testing, I check cases where the first command is just a few
> 
> > characters (starting at 8 character), all the way up to 2048
> 
> > characters.  Hopefully, this should mean we hit the problem case for
> 
> > most machine setups.
> 
> > 
> 
> > The only last question relates to commit afe09f0b6311a4d that I
> 
> > mentioned earlier.  That commit was intended to provide support for
> 
> > Microsoft named pipes:
> 
> > 
> 
> >   https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes
> 
> > 
> 
> > I know next to nothing about this topic beyond a brief scan of the
> 
> > above link, but I think these windows named pipe are closer in
> 
> > behaviour to unix sockets than to unix named fifos.
> 
> > 
> 
> > I am a little nervous that, after the above commit, we now use the
> 
> > same FILE for in, err, and out streams.  In contrast, in a vanilla C
> 
> > program, I would expect different FILE objects for each stream.
> 
> > 
> 
> > Still, I'm reluctant to revert the above commit (and provide the same
> 
> > functionality a different way) without a specific bug to point at,
> 
> > and, now that the streams are unbuffered, I expect a lot of the read
> 
> > and write calls are going straight to the kernel with minimal glibc
> 
> > involvement, so maybe it doesn't really matter.  Anyway, I haven't
> 
> > touched the above patch, but it is something to keep in mind when
> 
> > working in this area.
> 
> > 
> 
> > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28711
> 
> 
> 
> The change looks ok to me (better than the status quo), given that
> 
> correctness is more important than performance.
> 
> 
> 
> I'm just wondering if there's a noticeable performance difference
> 
> between having the input buffered vs unbuffered.  Calling fgetc with
> 
> unbuffered input means we do one syscall per character.  With frontends
> 
> sending tons of commands, it could possibly affect the responsiveness
> 
> and degrade user experience.  But it's just a guess, we should be able
> 
> to measure it.
> 
> 
> 
> How difficult would it be to just have different behaviors on Windows,
> 
> opening the stream only once, vs Linux, opening three streams?

I think it would be easy enough to implement new-ui differently for
unix vs for windows.

But, if I've left you with the impression that this is a possible
solution, then I failed with my commit message.  Even on Linux, when
we're not using new-ui at all, we currently can loose commands unless
we switch to unbuffered input.

Another option; maybe there's some ioctl that can be used to tell the
kernel that a newline at the start of a read request should be
considered a line break.  However, if such an ioctl exits, I've failed
to find it.

The final option that I considered is to place glibc into unbuffered
mode, and then do the buffering ourselves within gdb.  This obviously
adds a load more complexity which is why I'd rather not do this.

But, if we did do this, then at the end of
gdb_readline_no_editing_callback we could set the flag
call_stdin_event_handler_again_p, and things would "just work".

I can take a look at gathering some performance figures for sure and
will update once I have some data, but this patch is fixing an
existing bug...

Thanks,
Andrew