From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 821693851898 for ; Sat, 19 Nov 2022 01:16:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 821693851898 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca Received: from [10.0.0.11] (unknown [217.28.27.60]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id D11401E0D3; Fri, 18 Nov 2022 20:16:53 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail; t=1668820614; bh=zulesLoHbJDLrvz45+AU5c7c+dEopTFQbVBh7nwveaI=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=H/8GneJc5T7g2UgPxS6i19cbyaNzfx550NYrnhRhefNCqfsdve0nUh1QUDWoY9DvP DSucTDDIBvKtAbV8TWyFvR4JjQCi3jw0QBkJibd6i4ivwgOr+AMuFZPVM44h1JC5ag vjbz3IhX0cRdgjtoXyEvqG3aVmcTpjsGWw1yO3oE= Message-ID: Date: Fri, 18 Nov 2022 20:16:53 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [PATCH 8/8] gdb: disable commit resumed in target_kill Content-Language: en-US To: Andrew Burgess , Simon Marchi via Gdb-patches Cc: Simon Marchi References: <20221117194241.1776125-1-simon.marchi@efficios.com> <20221117194241.1776125-9-simon.marchi@efficios.com> <8735ag8k07.fsf@redhat.com> From: Simon Marchi In-Reply-To: <8735ag8k07.fsf@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 11/18/22 08:33, Andrew Burgess via Gdb-patches wrote: > Simon Marchi via Gdb-patches writes: > >> PR 28275 shows that doing a sequence of: >> >> - Run inferior in background (run &) >> - kill that inferior >> - Run again >> >> We get into this assertion: >> >> /home/smarchi/src/binutils-gdb/gdb/target.c:2590: internal-error: target_wait: Assertion `!proc_target->commit_resumed_state' failed. >> >> #0 internal_error_loc (file=0x5606b344e740 "/home/smarchi/src/binutils-gdb/gdb/target.c", line=2590, fmt=0x5606b344d6a0 "%s: Assertion `%s' failed.") at /home/smarchi/src/binutils-gdb/gdbsupport/errors.cc:54 >> #1 0x00005606b6296475 in target_wait (ptid=..., status=0x7fffb9390630, options=...) at /home/smarchi/src/binutils-gdb/gdb/target.c:2590 >> #2 0x00005606b5767a98 in startup_inferior (proc_target=0x5606bfccb2a0 , pid=3884857, ntraps=1, last_waitstatus=0x0, last_ptid=0x0) at /home/smarchi/src/binutils-gdb/gdb/nat/fork-inferior.c:482 >> #3 0x00005606b4e6c9c5 in gdb_startup_inferior (pid=3884857, num_traps=1) at /home/smarchi/src/binutils-gdb/gdb/fork-child.c:132 >> #4 0x00005606b50f14a5 in inf_ptrace_target::create_inferior (this=0x5606bfccb2a0 , exec_file=0x604000039f50 "/home/smarchi/build/binutils-gdb/gdb/test", allargs="", env=0x61500000a580, from_tty=1) >> at /home/smarchi/src/binutils-gdb/gdb/inf-ptrace.c:105 >> #5 0x00005606b53b6d23 in linux_nat_target::create_inferior (this=0x5606bfccb2a0 , exec_file=0x604000039f50 "/home/smarchi/build/binutils-gdb/gdb/test", allargs="", env=0x61500000a580, from_tty=1) >> at /home/smarchi/src/binutils-gdb/gdb/linux-nat.c:978 >> #6 0x00005606b512b79b in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:468 >> #7 0x00005606b512c236 in run_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:526 >> >> When running the kill command, commit_resumed_state for the >> process_stratum_target (linux-nat, here) is true. After the kill, when >> there are no more threads, commit_resumed_state is still true, as >> nothing touches this flag during the kill operation. During the >> subsequent run command, run_command_1 does: >> >> scoped_disable_commit_resumed disable_commit_resumed ("running"); >> >> We would think that this would clear the commit_resumed_state flag of >> our native target, but that's not the case, because >> scoped_disable_commit_resumed iterates on non-exited inferiors in order >> to find active process targets. And after the kill, the inferior is >> exited, and the native target was unpushed from it anyway. So >> scoped_disable_commit_resumed doesn't touch the commit_resumed_state >> flag of the native target, it stays true. When reaching target_wait, in >> startup_inferior (to consume the initial expect stop events while the >> inferior is starting up and working its way through the shell), >> commit_resumed_state is true, breaking the contract saying that >> commit_resumed_state is always false when calling the targets' wait >> method. >> >> (note: to be correct, I think that startup_inferior should toggle >> commit_resumed between the target_wait and target_resume calls, but I'll >> ignore that for now) >> >> I can see multiple ways to fix this. In the end, we need >> commit_resumed_state to be cleared by the time we get to that >> target_wait. It could be done at the end of the kill command, or at the >> beginning of the run command. >> >> To keep things in a coherent state, I'd like to make it so that after >> the kill command, when the target is left with no threads, its >> commit_resumed_state flag is left to false. This way, we can keep >> working with the assumption that a target with no threads (and therefore >> no running threads) has commit_resumed_state == false. >> >> Do this by adding a scoped_disable_commit_resumed in target_kill. It >> clears the target's commit_resumed_state on entry, and leaves it false >> if the target does not have any resumed thread on exit. That means, >> even if the target has another inferior with stopped threads, >> commit_resumed_state will be left to false, which makes sense. >> >> Add a test that tries to cover various combinations of actions done >> while an inferior is running (and therefore while commit_resumed_state >> is true on the process target). >> >> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28275 >> Change-Id: I8e6fe6dc1f475055921520e58cab68024039a1e9 >> --- >> gdb/target.c | 8 +- >> .../gdb.base/run-control-while-bg-execution.c | 33 +++++ >> .../run-control-while-bg-execution.exp | 118 ++++++++++++++++++ >> 3 files changed, 157 insertions(+), 2 deletions(-) >> create mode 100644 gdb/testsuite/gdb.base/run-control-while-bg-execution.c >> create mode 100644 gdb/testsuite/gdb.base/run-control-while-bg-execution.exp >> >> diff --git a/gdb/target.c b/gdb/target.c >> index 4a22885b82f..f5c6212310a 100644 >> --- a/gdb/target.c >> +++ b/gdb/target.c >> @@ -907,8 +907,12 @@ add_deprecated_target_alias (const target_info &tinfo, const char *alias) >> void >> target_kill (void) >> { >> - current_inferior ()->top_target ()->kill (); >> -} >> + >> + /* Ensure that, if commit resumed for the target is currently true (threads >> + are running), if we kill the last running inferior, commit resumed ends up >> + false. */ >> + scoped_disable_commit_resumed disable ("killing"); current_inferior >> + ()->top_target ()->kill (); } > > I think something went wrong with the formatting here! Woops, fixed. > > I first read this comment before reading the commit message, and, > initially, the comment didn't make any sense to me. > > My thinking was, surely the scoped_disable_commit_resumed would set the > commit resumed state to false, but only for the duration of this scope. > When we leave the scope, the state will be set back to true... > > ...the subtlety of course, is that when we create the > scoped_disable_commit_resumed, the current inferior is non-exited, and > so has its state set to false, but, when we exit the scope, the current > inferior will be exited, so its commit resume state will be left false. > > Personally, I think this is a non-obvious detail. I think it would be > helpful if the comment explained this aspect a little more. I agree, trying to fit it all in a single sentence didn't help. What about: /* If the commit_resume_state of the to-be-killed-inferior's process stratum is true, and this inferior is the last live inferior with resumed threads of that target, then we want to leave commit_resume_state to false, as the target won't have any resumed threads anymore. We achieve this with this scoped_disable_commit_resumed. On construction, it will set the flag to false. On destruction, it will only set it to true if there are resumed threads left. */ > > I'm also seeing something weird with the test when run with the > native-extended-gdbserver board, this warning: > > WARNING: Timed out waiting for EOF in server after monitor exit > > I've not done a full investigation, but what seems to happen is GDB > sends the 'monitor exit', and gdbserver does indeed try to exit. It > then looks like pthreads(?) is waiting on some threads? Or some child > processes? At the point gdbserver is not exiting, the stack looks like: > > #0 0x00007f7009660374 in wait () from /lib64/libpthread.so.0 > #1 0x000056146a26a858 in reap_children () > #2 0x000056146a26b21c in new_job () > #3 0x000056146a27740c in update_file () > #4 0x000056146a277d64 in update_goal_chain () > #5 0x000056146a25b893 in main () I believe this is a stack trace of the make process, probably the one invoking dejagnu / expect. > > Do you also see this timeout? > > Additionally, I have not (yet) been able to reproduce this when running > the test from the command line - in that case, gdbserver always exits > immediately after getting the 'monitor exit'. Ah, funny that you dug into that too :). I noticed it and had a discussion about that with Pedro. Here's what we found. Doing "monitor exit" does indeed make gdbserver exit, after which (during the delay) it is zombie (it shows up as in ps). We expect this "eof" expect clause to trigger: https://gitlab.com/gnutools/binutils-gdb/-/blob/5e219e0f46055281cfbc9351a3d27a05841be34d/gdb/testsuite/lib/gdbserver-support.exp#L476 Our theory is that the "eof" clause triggers when the pty controlling GDBserver returns EOF, that is when nobody is connected anymore on the slave side. However, there is a child process that is detached during the test that is still alive, that is still connected to the pty. So eof never triggers, we hang until expect gives up. Actually, I have this workaround in the patch: + # Kill the detached process, otherwise that makes "monitor exit" hang + # until the process disappears. + #remote_exec target "kill $child_pid" I commented it out to investigate the issue, and forgot to uncomment it. I'll uncomment it. I think it would work fine to do the "wait" right after sending the "monitor exit", just like this: if { $is_mi } { set monitor_exit "-interpreter-exec console \"monitor exit\"" } else { set monitor_exit "monitor exit" } gdb_test -nopass "$monitor_exit" "" "monitor exit" wait -i $server_spawn_id We reap gdbserver even if the detach process is still alive. Worst case, the pty and detached process stay there until the detached process exits by itself (which generally happens with our testsuite programs, we try not to write infinite loops). The problem is that there is no timeout mechanism, so it could hang forever. Imagine the "monitor exit" fails for some reason, or GDBserver is stuck in a loop while trying to exit. That's the point of the current code. Note that this isn't unique to my test. I tried putting a fail in the eof clause and ran the testsuite, I had about 20 such failures when running the native-extended-gdbserver board. Maybe the solution is just to make sure the tests don't leave detached process hanging around. We can put a fail in there and fix the offending tests. Any new test that leaves child processes running will be easy to spot. Simon