From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id BA6193858D28 for ; Tue, 23 Nov 2021 22:03:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA6193858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca Received: from [172.16.0.95] (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 3EE991EDF0; Tue, 23 Nov 2021 17:03:01 -0500 (EST) Subject: Re: [PATCH 4/4] gdb/remote: some fixes for 'maint set target-async off' To: Andrew Burgess , gdb-patches@sourceware.org References: <01e8c6d962f5112861a1ebe07a60cef95b79ac11.1637676250.git.aburgess@redhat.com> From: Simon Marchi Message-ID: Date: Tue, 23 Nov 2021 17:03:00 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <01e8c6d962f5112861a1ebe07a60cef95b79ac11.1637676250.git.aburgess@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: tl Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2021 22:03:03 -0000 On 2021-11-23 9:08 a.m., Andrew Burgess via Gdb-patches wrote: > While working on another patch relating to remote targets, I wanted to > test with 'maint set target-async off' in place. Unfortunately I ran > into some problems. This commit is an attempt to fix one of the > issues I hit. > > In my particular case I was actually running with: > > maint set target-async off > maint set target-non-stop off > > that is, we're telling GDB to force the targets to operate in > non-async mode, and in all-stop mode. Here's my GDB session showing > the problem: > > (gdb) maintenance set target-async off > (gdb) maintenance set target-non-stop off > (gdb) target extended-remote :54321 > Remote debugging using :54321 > (gdb) attach 2365960 > Attaching to process 2365960 > No unwaited-for children left. > (gdb) > > Notice the 'No unwaited-for children left.' error, this is the > problem. There's no reason why GDB should not be able to attach to > the process. > > The problem is this: > > 1. The user runs 'attach PID' and this sends GDB into attach_command > in infcmd.c. From here we call the ::attach method on the attach > target, which will be the extended_remote_target. > > 2. In extended_remote_target::attach, we attach to the remote target > and get the first reply (which is a stop packet). We put off > processing the stop packet until the end of ::attach. We setup the > inferior and thread to represent the process we attached to, and > download the target description. Finally, we process the initial > stop packet. > > If '!target_is_non_stop_p ()' and '!target_can_async_p ()', which is > the case for us given the maintenance commands we used, we cache the > stop packet within the remote_state::buf for later processing. > > 3. Back in attach_command, if 'target_is_non_stop_p ()' then we > request that the target stops. This will either process any cached > stop replies, or request that the target stops, and process the stop > replies. However, this code is not what we use due to non-stop mode > being disabled. So, we skip to the next step which is to call > validate_exec_file. > > 4. Calling validate_exec_file can cause packets to be sent to the > remote target, and replies received, the first path I hit is the > call to target_pid_to_exec_file, which calls > remote_target::pid_to_exec_file, which can then try to read the > executable from the remote. Sending an receiving packets will make > use of the remote_state::buf object. > > 5. The attempt to attach continues, but the damage is already done... > > So, the problem is that, in step #2 we cache a stop reply in the > remote_state::buf, and then in step #4 we reuse the remote_state::buf > object, discarding any cached stop reply. As a result, the initial > stop, which is sent when GDB first attaches to the target, is lost. > > This problem can clearly be seen I feel by looking at the > remote_state::cached_wait_status flag. This flag tells GDB if there > is a wait status cached in remote_state::buf. However, in > remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1 > this flag is just set back to 0, doing this immediately discards any > cached data. > > I don't know if this scheme ever made sense, maybe once upon a time, > GDB would, when it found it had no cached stop, simply re-request the > stop information from the target, however, this certainly isn't the > case now, and resetting the cached_wait_status is (I claim) a bad > thing. > > So, my first step toward fixing this issue was to replace the two > instances of 'rs->cached_wait_status = 0;' in ::putpkt_binary and > ::getpkt_or_notif_sane_1 with 'gdb_assert (rs->cached_wait_status == > 0);', this, at least would show me when GDB was doing something > dangerous, and indeed, this assert is now hit in my test case above. Good idea. > > I did play with using some kind of scoped restore to backup, and > restore the remote_state::buf object in all the places within remote.c > that I was hitting where the ::buf was being corrupted. The first > problem with this is that, where the ::cached_wait_status flag is > reset is _not_ where ::buf is corrupted. For the ::putpkt_binary > case, by the time we get to the method the buffer has already been > corrupted in many cases, so we end up needing to add the scoped > save/restore within the callers, which means we need the save/restore > in _lots_ of places. > > Plus, using this save/restore model feels like the wrong solution. I > don't think that it's obvious that the buffer might be holding cached > data, and I think it would be too easy for new corruptions of the > buffer to be introduced, which could easily go unnoticed for a long > time. > > So, I really wanted a solution that didn't require us to cache data in > the ::buf object. > > Luckily, I think we already have such a solution in place, the > remote_state::stop_reply_queue, it seems like this does exactly the > same task, just in a slightly different way. With the > ::stop_reply_queue, the stop packets are processed upon receipt and > the stop_reply object is added to the queue. With the ::buf cache > solution, the unprocessed stop reply is cached in the ::buf, and > processed later. > > So, finally, in this commit, I propose to remove the > remote_state::cached_wait_status flag and to stop using the ::buf to > cache stop replies. Instead, stop replies will now always be stored > in the ::stop_reply_queue. > > There are two places where we use the ::buf to hold a cached stop > reply, the first is in the ::attach method, and the second is in > remote_target::start_remote, however, the second of these cases is far > less problematic, as after caching the stop reply in ::buf we call the > global start_remote function, which does very little work before > calling normal_stop, which processes the cached stop reply. However, > my plan is to switch both users over to using ::stop_reply_queue so > that the old (unsafe) ::cached_wait_status mechanism can be completely > removed. > > The next problem is that the ::stop_reply_queue is currently only used > for async-mode, and so, in remote_target::push_stop_reply, where we > push stop_reply objects into the ::stop_reply_queue, we currently also > mark the async event token. I've modified this so we only mark the > async event token if 'target_can_async_p ()' - note, _can_, not _is_ > here, ::push_stop_reply is called in places where async mode has been > temporarily disabled, but, in general, the target can (and will) be > switched back into async mode. > > Another change of interest is in remote_target::remote_interrupt_as. > Previously this code checked ::cached_wait_status, but didn't check > for events in the ::stop_reply_queue. Now that ::cached_wait_status > has been removed we now check the queue length instead, which should > have the same result. > > Finally, in remote_target::wait_as, I've tried to merge the processing > of the ::stop_reply_queue with how we used to handle the > ::cached_wait_status flag. > > Currently, when processing the ::stop_reply_queue we call > process_stop_reply and immediately return. However, when handling > ::cached_wait_status we run through the whole of ::wait_as, and return > at the end of the function. > > If we consider a standard stop packet, the two differences I see are: > > 1. Resetting of the remote_state::waiting_for_stop_reply, flag; this > is not currently done when processing a stop from the > ::stop_reply_queue. > > 2. The final return value has the possibility of being adjusted at > the end of ::wait_as, as well as there being calls to > record_currthread, non of which are done if we process a stop from > the ::stop_reply_queue. > > After this commit I have unified these two paths, that is, when we > process a packet from ::stop_reply_queue we now reset the flag > mentioned in #1 above, and pass through the return value adjustment > code at the end of ::wait_as. > > An example of a test that reveals the benefits of this commit is: > > make check-gdb \ > RUNTESTFLAGS="--target_board=native-extended-gdbserver \ > GDBFLAGS='-ex maint\ set\ target-async\ off \ > -ex maint\ set\ target-non-stop\ off' \ > gdb.base/attach.exp" > > For testing I've been running test on x86-64/GNU Linux, and run with > target boards unix, native-gdbserver, and native-extended-gdbserver. > For each board I've run with the default GDBFLAGS, as well as with: > > GDBFLAGS='-ex maint\ set\ target-async\ off \ > -ex maint\ set\ target-non-stop\ off' \ > > Though running with the above GDBFLAGS is clearly a lot more unstable > both before and after my patch, I'm not seeing any consistent new > failures with my patch, except, with the native-extended-gdbserver > board, where I am seeing new failures, but only because more tests are > now running. For that configuration alone I see the number of > unresolved go down by 49, the number of passes goes up by 446, and the > number of failures also increases by 144. All of the failures are new > tests as far as I can tell. I don't have time to look at the code right now, but I read the commit message, I think the direction makes sense. Simon