From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simark@simark.ca>
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id BA6193858D28
 for <gdb-patches@sourceware.org>; Tue, 23 Nov 2021 22:03:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA6193858D28
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=simark.ca
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca
Received: from [172.16.0.95] (192-222-180-24.qc.cable.ebox.net
 [192.222.180.24])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id 3EE991EDF0;
 Tue, 23 Nov 2021 17:03:01 -0500 (EST)
Subject: Re: [PATCH 4/4] gdb/remote: some fixes for 'maint set target-async
 off'
To: Andrew Burgess <aburgess@redhat.com>, gdb-patches@sourceware.org
References: <cover.1637676250.git.aburgess@redhat.com>
 <01e8c6d962f5112861a1ebe07a60cef95b79ac11.1637676250.git.aburgess@redhat.com>
From: Simon Marchi <simark@simark.ca>
Message-ID: <b2fbfef2-5881-fcbb-38e8-00c741b58990@simark.ca>
Date: Tue, 23 Nov 2021 17:03:00 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.14.0
MIME-Version: 1.0
In-Reply-To: <01e8c6d962f5112861a1ebe07a60cef95b79ac11.1637676250.git.aburgess@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Language: tl
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Nov 2021 22:03:03 -0000

On 2021-11-23 9:08 a.m., Andrew Burgess via Gdb-patches wrote:
> While working on another patch relating to remote targets, I wanted to
> test with 'maint set target-async off' in place.  Unfortunately I ran
> into some problems.  This commit is an attempt to fix one of the
> issues I hit.
>
> In my particular case I was actually running with:
>
>   maint set target-async off
>   maint set target-non-stop off
>
> that is, we're telling GDB to force the targets to operate in
> non-async mode, and in all-stop mode.  Here's my GDB session showing
> the problem:
>
>   (gdb) maintenance set target-async off
>   (gdb) maintenance set target-non-stop off
>   (gdb) target extended-remote :54321
>   Remote debugging using :54321
>   (gdb) attach 2365960
>   Attaching to process 2365960
>   No unwaited-for children left.
>   (gdb)
>
> Notice the 'No unwaited-for children left.' error, this is the
> problem.  There's no reason why GDB should not be able to attach to
> the process.
>
> The problem is this:
>
>   1. The user runs 'attach PID' and this sends GDB into attach_command
>   in infcmd.c.  From here we call the ::attach method on the attach
>   target, which will be the extended_remote_target.
>
>   2. In extended_remote_target::attach, we attach to the remote target
>   and get the first reply (which is a stop packet).  We put off
>   processing the stop packet until the end of ::attach.  We setup the
>   inferior and thread to represent the process we attached to, and
>   download the target description.  Finally, we process the initial
>   stop packet.
>
>   If '!target_is_non_stop_p ()' and '!target_can_async_p ()', which is
>   the case for us given the maintenance commands we used, we cache the
>   stop packet within the remote_state::buf for later processing.
>
>   3. Back in attach_command, if 'target_is_non_stop_p ()' then we
>   request that the target stops.  This will either process any cached
>   stop replies, or request that the target stops, and process the stop
>   replies.  However, this code is not what we use due to non-stop mode
>   being disabled.  So, we skip to the next step which is to call
>   validate_exec_file.
>
>   4. Calling validate_exec_file can cause packets to be sent to the
>   remote target, and replies received, the first path I hit is the
>   call to target_pid_to_exec_file, which calls
>   remote_target::pid_to_exec_file, which can then try to read the
>   executable from the remote.  Sending an receiving packets will make
>   use of the remote_state::buf object.
>
>   5. The attempt to attach continues, but the damage is already done...
>
> So, the problem is that, in step #2 we cache a stop reply in the
> remote_state::buf, and then in step #4 we reuse the remote_state::buf
> object, discarding any cached stop reply.  As a result, the initial
> stop, which is sent when GDB first attaches to the target, is lost.
>
> This problem can clearly be seen I feel by looking at the
> remote_state::cached_wait_status flag.  This flag tells GDB if there
> is a wait status cached in remote_state::buf.  However, in
> remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1
> this flag is just set back to 0, doing this immediately discards any
> cached data.
>
> I don't know if this scheme ever made sense, maybe once upon a time,
> GDB would, when it found it had no cached stop, simply re-request the
> stop information from the target, however, this certainly isn't the
> case now, and resetting the cached_wait_status is (I claim) a bad
> thing.
>
> So, my first step toward fixing this issue was to replace the two
> instances of 'rs->cached_wait_status = 0;' in ::putpkt_binary and
> ::getpkt_or_notif_sane_1 with 'gdb_assert (rs->cached_wait_status ==
> 0);', this, at least would show me when GDB was doing something
> dangerous, and indeed, this assert is now hit in my test case above.

Good idea.

>
> I did play with using some kind of scoped restore to backup, and
> restore the remote_state::buf object in all the places within remote.c
> that I was hitting where the ::buf was being corrupted.  The first
> problem with this is that, where the ::cached_wait_status flag is
> reset is _not_ where ::buf is corrupted.  For the ::putpkt_binary
> case, by the time we get to the method the buffer has already been
> corrupted in many cases, so we end up needing to add the scoped
> save/restore within the callers, which means we need the save/restore
> in _lots_ of places.
>
> Plus, using this save/restore model feels like the wrong solution.  I
> don't think that it's obvious that the buffer might be holding cached
> data, and I think it would be too easy for new corruptions of the
> buffer to be introduced, which could easily go unnoticed for a long
> time.
>
> So, I really wanted a solution that didn't require us to cache data in
> the ::buf object.
>
> Luckily, I think we already have such a solution in place, the
> remote_state::stop_reply_queue, it seems like this does exactly the
> same task, just in a slightly different way.  With the
> ::stop_reply_queue, the stop packets are processed upon receipt and
> the stop_reply object is added to the queue.  With the ::buf cache
> solution, the unprocessed stop reply is cached in the ::buf, and
> processed later.
>
> So, finally, in this commit, I propose to remove the
> remote_state::cached_wait_status flag and to stop using the ::buf to
> cache stop replies.  Instead, stop replies will now always be stored
> in the ::stop_reply_queue.
>
> There are two places where we use the ::buf to hold a cached stop
> reply, the first is in the ::attach method, and the second is in
> remote_target::start_remote, however, the second of these cases is far
> less problematic, as after caching the stop reply in ::buf we call the
> global start_remote function, which does very little work before
> calling normal_stop, which processes the cached stop reply.  However,
> my plan is to switch both users over to using ::stop_reply_queue so
> that the old (unsafe) ::cached_wait_status mechanism can be completely
> removed.
>
> The next problem is that the ::stop_reply_queue is currently only used
> for async-mode, and so, in remote_target::push_stop_reply, where we
> push stop_reply objects into the ::stop_reply_queue, we currently also
> mark the async event token.  I've modified this so we only mark the
> async event token if 'target_can_async_p ()' - note, _can_, not _is_
> here, ::push_stop_reply is called in places where async mode has been
> temporarily disabled, but, in general, the target can (and will) be
> switched back into async mode.
>
> Another change of interest is in remote_target::remote_interrupt_as.
> Previously this code checked ::cached_wait_status, but didn't check
> for events in the ::stop_reply_queue.  Now that ::cached_wait_status
> has been removed we now check the queue length instead, which should
> have the same result.
>
> Finally, in remote_target::wait_as, I've tried to merge the processing
> of the ::stop_reply_queue with how we used to handle the
> ::cached_wait_status flag.
>
> Currently, when processing the ::stop_reply_queue we call
> process_stop_reply and immediately return.  However, when handling
> ::cached_wait_status we run through the whole of ::wait_as, and return
> at the end of the function.
>
> If we consider a standard stop packet, the two differences I see are:
>
>   1. Resetting of the remote_state::waiting_for_stop_reply, flag; this
>   is not currently done when processing a stop from the
>   ::stop_reply_queue.
>
>   2. The final return value has the possibility of being adjusted at
>   the end of ::wait_as, as well as there being calls to
>   record_currthread, non of which are done if we process a stop from
>   the ::stop_reply_queue.
>
> After this commit I have unified these two paths, that is, when we
> process a packet from ::stop_reply_queue we now reset the flag
> mentioned in #1 above, and pass through the return value adjustment
> code at the end of ::wait_as.
>
> An example of a test that reveals the benefits of this commit is:
>
>   make check-gdb \
>     RUNTESTFLAGS="--target_board=native-extended-gdbserver \
>                   GDBFLAGS='-ex maint\ set\ target-async\ off \
>                             -ex maint\ set\ target-non-stop\ off' \
>                   gdb.base/attach.exp"
>
> For testing I've been running test on x86-64/GNU Linux, and run with
> target boards unix, native-gdbserver, and native-extended-gdbserver.
> For each board I've run with the default GDBFLAGS, as well as with:
>
>   GDBFLAGS='-ex maint\ set\ target-async\ off \
>             -ex maint\ set\ target-non-stop\ off' \
>
> Though running with the above GDBFLAGS is clearly a lot more unstable
> both before and after my patch, I'm not seeing any consistent new
> failures with my patch, except, with the native-extended-gdbserver
> board, where I am seeing new failures, but only because more tests are
> now running.  For that configuration alone I see the number of
> unresolved go down by 49, the number of passes goes up by 446, and the
> number of failures also increases by 144.  All of the failures are new
> tests as far as I can tell.

I don't have time to look at the code right now, but I read the commit
message, I think the direction makes sense.

Simon