public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simark@simark.ca>
To: Andrew Burgess <aburgess@redhat.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH 4/4] gdb/remote: some fixes for 'maint set target-async off'
Date: Tue, 23 Nov 2021 17:03:00 -0500	[thread overview]
Message-ID: <b2fbfef2-5881-fcbb-38e8-00c741b58990@simark.ca> (raw)
In-Reply-To: <01e8c6d962f5112861a1ebe07a60cef95b79ac11.1637676250.git.aburgess@redhat.com>

On 2021-11-23 9:08 a.m., Andrew Burgess via Gdb-patches wrote:
> While working on another patch relating to remote targets, I wanted to
> test with 'maint set target-async off' in place.  Unfortunately I ran
> into some problems.  This commit is an attempt to fix one of the
> issues I hit.
>
> In my particular case I was actually running with:
>
>   maint set target-async off
>   maint set target-non-stop off
>
> that is, we're telling GDB to force the targets to operate in
> non-async mode, and in all-stop mode.  Here's my GDB session showing
> the problem:
>
>   (gdb) maintenance set target-async off
>   (gdb) maintenance set target-non-stop off
>   (gdb) target extended-remote :54321
>   Remote debugging using :54321
>   (gdb) attach 2365960
>   Attaching to process 2365960
>   No unwaited-for children left.
>   (gdb)
>
> Notice the 'No unwaited-for children left.' error, this is the
> problem.  There's no reason why GDB should not be able to attach to
> the process.
>
> The problem is this:
>
>   1. The user runs 'attach PID' and this sends GDB into attach_command
>   in infcmd.c.  From here we call the ::attach method on the attach
>   target, which will be the extended_remote_target.
>
>   2. In extended_remote_target::attach, we attach to the remote target
>   and get the first reply (which is a stop packet).  We put off
>   processing the stop packet until the end of ::attach.  We setup the
>   inferior and thread to represent the process we attached to, and
>   download the target description.  Finally, we process the initial
>   stop packet.
>
>   If '!target_is_non_stop_p ()' and '!target_can_async_p ()', which is
>   the case for us given the maintenance commands we used, we cache the
>   stop packet within the remote_state::buf for later processing.
>
>   3. Back in attach_command, if 'target_is_non_stop_p ()' then we
>   request that the target stops.  This will either process any cached
>   stop replies, or request that the target stops, and process the stop
>   replies.  However, this code is not what we use due to non-stop mode
>   being disabled.  So, we skip to the next step which is to call
>   validate_exec_file.
>
>   4. Calling validate_exec_file can cause packets to be sent to the
>   remote target, and replies received, the first path I hit is the
>   call to target_pid_to_exec_file, which calls
>   remote_target::pid_to_exec_file, which can then try to read the
>   executable from the remote.  Sending an receiving packets will make
>   use of the remote_state::buf object.
>
>   5. The attempt to attach continues, but the damage is already done...
>
> So, the problem is that, in step #2 we cache a stop reply in the
> remote_state::buf, and then in step #4 we reuse the remote_state::buf
> object, discarding any cached stop reply.  As a result, the initial
> stop, which is sent when GDB first attaches to the target, is lost.
>
> This problem can clearly be seen I feel by looking at the
> remote_state::cached_wait_status flag.  This flag tells GDB if there
> is a wait status cached in remote_state::buf.  However, in
> remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1
> this flag is just set back to 0, doing this immediately discards any
> cached data.
>
> I don't know if this scheme ever made sense, maybe once upon a time,
> GDB would, when it found it had no cached stop, simply re-request the
> stop information from the target, however, this certainly isn't the
> case now, and resetting the cached_wait_status is (I claim) a bad
> thing.
>
> So, my first step toward fixing this issue was to replace the two
> instances of 'rs->cached_wait_status = 0;' in ::putpkt_binary and
> ::getpkt_or_notif_sane_1 with 'gdb_assert (rs->cached_wait_status ==
> 0);', this, at least would show me when GDB was doing something
> dangerous, and indeed, this assert is now hit in my test case above.

Good idea.

>
> I did play with using some kind of scoped restore to backup, and
> restore the remote_state::buf object in all the places within remote.c
> that I was hitting where the ::buf was being corrupted.  The first
> problem with this is that, where the ::cached_wait_status flag is
> reset is _not_ where ::buf is corrupted.  For the ::putpkt_binary
> case, by the time we get to the method the buffer has already been
> corrupted in many cases, so we end up needing to add the scoped
> save/restore within the callers, which means we need the save/restore
> in _lots_ of places.
>
> Plus, using this save/restore model feels like the wrong solution.  I
> don't think that it's obvious that the buffer might be holding cached
> data, and I think it would be too easy for new corruptions of the
> buffer to be introduced, which could easily go unnoticed for a long
> time.
>
> So, I really wanted a solution that didn't require us to cache data in
> the ::buf object.
>
> Luckily, I think we already have such a solution in place, the
> remote_state::stop_reply_queue, it seems like this does exactly the
> same task, just in a slightly different way.  With the
> ::stop_reply_queue, the stop packets are processed upon receipt and
> the stop_reply object is added to the queue.  With the ::buf cache
> solution, the unprocessed stop reply is cached in the ::buf, and
> processed later.
>
> So, finally, in this commit, I propose to remove the
> remote_state::cached_wait_status flag and to stop using the ::buf to
> cache stop replies.  Instead, stop replies will now always be stored
> in the ::stop_reply_queue.
>
> There are two places where we use the ::buf to hold a cached stop
> reply, the first is in the ::attach method, and the second is in
> remote_target::start_remote, however, the second of these cases is far
> less problematic, as after caching the stop reply in ::buf we call the
> global start_remote function, which does very little work before
> calling normal_stop, which processes the cached stop reply.  However,
> my plan is to switch both users over to using ::stop_reply_queue so
> that the old (unsafe) ::cached_wait_status mechanism can be completely
> removed.
>
> The next problem is that the ::stop_reply_queue is currently only used
> for async-mode, and so, in remote_target::push_stop_reply, where we
> push stop_reply objects into the ::stop_reply_queue, we currently also
> mark the async event token.  I've modified this so we only mark the
> async event token if 'target_can_async_p ()' - note, _can_, not _is_
> here, ::push_stop_reply is called in places where async mode has been
> temporarily disabled, but, in general, the target can (and will) be
> switched back into async mode.
>
> Another change of interest is in remote_target::remote_interrupt_as.
> Previously this code checked ::cached_wait_status, but didn't check
> for events in the ::stop_reply_queue.  Now that ::cached_wait_status
> has been removed we now check the queue length instead, which should
> have the same result.
>
> Finally, in remote_target::wait_as, I've tried to merge the processing
> of the ::stop_reply_queue with how we used to handle the
> ::cached_wait_status flag.
>
> Currently, when processing the ::stop_reply_queue we call
> process_stop_reply and immediately return.  However, when handling
> ::cached_wait_status we run through the whole of ::wait_as, and return
> at the end of the function.
>
> If we consider a standard stop packet, the two differences I see are:
>
>   1. Resetting of the remote_state::waiting_for_stop_reply, flag; this
>   is not currently done when processing a stop from the
>   ::stop_reply_queue.
>
>   2. The final return value has the possibility of being adjusted at
>   the end of ::wait_as, as well as there being calls to
>   record_currthread, non of which are done if we process a stop from
>   the ::stop_reply_queue.
>
> After this commit I have unified these two paths, that is, when we
> process a packet from ::stop_reply_queue we now reset the flag
> mentioned in #1 above, and pass through the return value adjustment
> code at the end of ::wait_as.
>
> An example of a test that reveals the benefits of this commit is:
>
>   make check-gdb \
>     RUNTESTFLAGS="--target_board=native-extended-gdbserver \
>                   GDBFLAGS='-ex maint\ set\ target-async\ off \
>                             -ex maint\ set\ target-non-stop\ off' \
>                   gdb.base/attach.exp"
>
> For testing I've been running test on x86-64/GNU Linux, and run with
> target boards unix, native-gdbserver, and native-extended-gdbserver.
> For each board I've run with the default GDBFLAGS, as well as with:
>
>   GDBFLAGS='-ex maint\ set\ target-async\ off \
>             -ex maint\ set\ target-non-stop\ off' \
>
> Though running with the above GDBFLAGS is clearly a lot more unstable
> both before and after my patch, I'm not seeing any consistent new
> failures with my patch, except, with the native-extended-gdbserver
> board, where I am seeing new failures, but only because more tests are
> now running.  For that configuration alone I see the number of
> unresolved go down by 49, the number of passes goes up by 446, and the
> number of failures also increases by 144.  All of the failures are new
> tests as far as I can tell.

I don't have time to look at the code right now, but I read the commit
message, I think the direction makes sense.

Simon

  reply	other threads:[~2021-11-23 22:03 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 14:08 [PATCH 0/4] Improve 'maint set target-async off' for remote targets Andrew Burgess
2021-11-23 14:08 ` [PATCH 1/4] gdb: add asserts in target.c for target_async_permitted Andrew Burgess
2021-11-23 21:31   ` Simon Marchi
2021-11-23 14:08 ` [PATCH 2/4] gdb/remote: merge ::is_async_p and ::can_async_p methods Andrew Burgess
2021-11-23 21:33   ` Simon Marchi
2021-11-24 10:14     ` Andrew Burgess
2021-11-23 14:08 ` [PATCH 3/4] gdb: add assert in remote_target::wait relating to async being off Andrew Burgess
2021-11-23 21:50   ` Simon Marchi
2021-11-23 14:08 ` [PATCH 4/4] gdb/remote: some fixes for 'maint set target-async off' Andrew Burgess
2021-11-23 22:03   ` Simon Marchi [this message]
2021-11-23 16:39 ` [PATCH 0/4] Improve 'maint set target-async off' for remote targets John Baldwin
2021-11-24 12:22 ` [PATCHv2 0/6] " Andrew Burgess
2021-11-24 12:22   ` [PATCHv2 1/6] gdb: introduce a new overload of target_can_async_p Andrew Burgess
2021-11-24 12:22   ` [PATCHv2 2/6] gdb: hoist target_async_permitted checks into target.c Andrew Burgess
2021-11-24 21:17     ` Simon Marchi
2021-11-24 12:22   ` [PATCHv2 3/6] gdb: add asserts in target.c for target_async_permitted Andrew Burgess
2021-11-24 21:21     ` Simon Marchi
2021-11-24 12:22   ` [PATCHv2 4/6] gdb: simplify remote_target::is_async_p Andrew Burgess
2021-11-24 12:22   ` [PATCHv2 5/6] gdb: add assert in remote_target::wait relating to async being off Andrew Burgess
2021-11-24 21:23     ` Simon Marchi
2021-11-25 10:04       ` Andrew Burgess
2021-11-25 11:36         ` Tom de Vries
2021-11-25 13:46           ` Andrew Burgess
2021-11-25 14:17             ` Tom de Vries
2021-11-24 12:22   ` [PATCHv2 6/6] gdb/remote: some fixes for 'maint set target-async off' Andrew Burgess
2021-12-01 10:40   ` [PATCHv3 0/2] Improve 'maint set target-async off' for remote targets Andrew Burgess
2021-12-01 10:40     ` [PATCHv3 1/2] gdb/remote: some fixes for 'maint set target-async off' Andrew Burgess
2021-12-16 21:15       ` Pedro Alves
2021-12-17 13:35         ` Andrew Burgess
2021-12-17 14:05           ` Pedro Alves
2021-12-18 10:21             ` Andrew Burgess
2021-12-01 10:40     ` [PATCHv3 2/2] gdb: add assert in remote_target::wait relating to async being off Andrew Burgess
2021-12-16 21:16       ` Pedro Alves
2021-12-18 10:21         ` Andrew Burgess
2021-12-13 11:41     ` PING: [PATCHv3 0/2] Improve 'maint set target-async off' for remote targets Andrew Burgess
2021-12-13 17:20     ` John Baldwin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2fbfef2-5881-fcbb-38e8-00c741b58990@simark.ca \
    --to=simark@simark.ca \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).