public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simark@simark.ca>
To: Andrew Burgess <aburgess@redhat.com>,
	Simon Marchi via Gdb-patches <gdb-patches@sourceware.org>
Cc: Simon Marchi <simon.marchi@efficios.com>
Subject: Re: [PATCH 8/8] gdb: disable commit resumed in target_kill
Date: Fri, 18 Nov 2022 20:16:53 -0500	[thread overview]
Message-ID: <f30333a8-0958-ef5a-0444-5d3676d97eeb@simark.ca> (raw)
In-Reply-To: <8735ag8k07.fsf@redhat.com>



On 11/18/22 08:33, Andrew Burgess via Gdb-patches wrote:
> Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:
> 
>> PR 28275 shows that doing a sequence of:
>>
>>  - Run inferior in background (run &)
>>  - kill that inferior
>>  - Run again
>>
>> We get into this assertion:
>>
>>     /home/smarchi/src/binutils-gdb/gdb/target.c:2590: internal-error: target_wait: Assertion `!proc_target->commit_resumed_state' failed.
>>
>>     #0  internal_error_loc (file=0x5606b344e740 "/home/smarchi/src/binutils-gdb/gdb/target.c", line=2590, fmt=0x5606b344d6a0 "%s: Assertion `%s' failed.") at /home/smarchi/src/binutils-gdb/gdbsupport/errors.cc:54
>>     #1  0x00005606b6296475 in target_wait (ptid=..., status=0x7fffb9390630, options=...) at /home/smarchi/src/binutils-gdb/gdb/target.c:2590
>>     #2  0x00005606b5767a98 in startup_inferior (proc_target=0x5606bfccb2a0 <the_amd64_linux_nat_target>, pid=3884857, ntraps=1, last_waitstatus=0x0, last_ptid=0x0) at /home/smarchi/src/binutils-gdb/gdb/nat/fork-inferior.c:482
>>     #3  0x00005606b4e6c9c5 in gdb_startup_inferior (pid=3884857, num_traps=1) at /home/smarchi/src/binutils-gdb/gdb/fork-child.c:132
>>     #4  0x00005606b50f14a5 in inf_ptrace_target::create_inferior (this=0x5606bfccb2a0 <the_amd64_linux_nat_target>, exec_file=0x604000039f50 "/home/smarchi/build/binutils-gdb/gdb/test", allargs="", env=0x61500000a580, from_tty=1)
>>         at /home/smarchi/src/binutils-gdb/gdb/inf-ptrace.c:105
>>     #5  0x00005606b53b6d23 in linux_nat_target::create_inferior (this=0x5606bfccb2a0 <the_amd64_linux_nat_target>, exec_file=0x604000039f50 "/home/smarchi/build/binutils-gdb/gdb/test", allargs="", env=0x61500000a580, from_tty=1)
>>         at /home/smarchi/src/binutils-gdb/gdb/linux-nat.c:978
>>     #6  0x00005606b512b79b in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:468
>>     #7  0x00005606b512c236 in run_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:526
>>
>> When running the kill command, commit_resumed_state for the
>> process_stratum_target (linux-nat, here) is true.  After the kill, when
>> there are no more threads, commit_resumed_state is still true, as
>> nothing touches this flag during the kill operation.  During the
>> subsequent run command, run_command_1 does:
>>
>>     scoped_disable_commit_resumed disable_commit_resumed ("running");
>>
>> We would think that this would clear the commit_resumed_state flag of
>> our native target, but that's not the case, because
>> scoped_disable_commit_resumed iterates on non-exited inferiors in order
>> to find active process targets.  And after the kill, the inferior is
>> exited, and the native target was unpushed from it anyway.  So
>> scoped_disable_commit_resumed doesn't touch the commit_resumed_state
>> flag of the native target, it stays true.  When reaching target_wait, in
>> startup_inferior (to consume the initial expect stop events while the
>> inferior is starting up and working its way through the shell),
>> commit_resumed_state is true, breaking the contract saying that
>> commit_resumed_state is always false when calling the targets' wait
>> method.
>>
>> (note: to be correct, I think that startup_inferior should toggle
>> commit_resumed between the target_wait and target_resume calls, but I'll
>> ignore that for now)
>>
>> I can see multiple ways to fix this.  In the end, we need
>> commit_resumed_state to be cleared by the time we get to that
>> target_wait.  It could be done at the end of the kill command, or at the
>> beginning of the run command.
>>
>> To keep things in a coherent state, I'd like to make it so that after
>> the kill command, when the target is left with no threads, its
>> commit_resumed_state flag is left to false.  This way, we can keep
>> working with the assumption that a target with no threads (and therefore
>> no running threads) has commit_resumed_state == false.
>>
>> Do this by adding a scoped_disable_commit_resumed in target_kill.  It
>> clears the target's commit_resumed_state on entry, and leaves it false
>> if the target does not have any resumed thread on exit.  That means,
>> even if the target has another inferior with stopped threads,
>> commit_resumed_state will be left to false, which makes sense.
>>
>> Add a test that tries to cover various combinations of actions done
>> while an inferior is running (and therefore while commit_resumed_state
>> is true on the process target).
>>
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28275
>> Change-Id: I8e6fe6dc1f475055921520e58cab68024039a1e9
>> ---
>>  gdb/target.c                                  |   8 +-
>>  .../gdb.base/run-control-while-bg-execution.c |  33 +++++
>>  .../run-control-while-bg-execution.exp        | 118 ++++++++++++++++++
>>  3 files changed, 157 insertions(+), 2 deletions(-)
>>  create mode 100644 gdb/testsuite/gdb.base/run-control-while-bg-execution.c
>>  create mode 100644 gdb/testsuite/gdb.base/run-control-while-bg-execution.exp
>>
>> diff --git a/gdb/target.c b/gdb/target.c
>> index 4a22885b82f..f5c6212310a 100644
>> --- a/gdb/target.c
>> +++ b/gdb/target.c
>> @@ -907,8 +907,12 @@ add_deprecated_target_alias (const target_info &tinfo, const char *alias)
>>  void
>>  target_kill (void)
>>  {
>> -  current_inferior ()->top_target ()->kill ();
>> -}
>> +
>> +  /* Ensure that, if commit resumed for the target is currently true (threads
>> +     are running), if we kill the last running inferior, commit resumed ends up
>> +     false.  */
>> +   scoped_disable_commit_resumed disable ("killing"); current_inferior
>> +   ()->top_target ()->kill (); }
> 
> I think something went wrong with the formatting here!

Woops, fixed.

> 
> I first read this comment before reading the commit message, and,
> initially, the comment didn't make any sense to me.
> 
> My thinking was, surely the scoped_disable_commit_resumed would set the
> commit resumed state to false, but only for the duration of this scope.
> When we leave the scope, the state will be set back to true...
> 
> ...the subtlety of course, is that when we create the
> scoped_disable_commit_resumed, the current inferior is non-exited, and
> so has its state set to false, but, when we exit the scope, the current
> inferior will be exited, so its commit resume state will be left false.
> 
> Personally, I think this is a non-obvious detail.  I think it would be
> helpful if the comment explained this aspect a little more.

I agree, trying to fit it all in a single sentence didn't help.  What
about:

  /* If the commit_resume_state of the to-be-killed-inferior's process stratum
     is true, and this inferior is the last live inferior with resumed threads
     of that target, then we want to leave commit_resume_state to false, as the
     target won't have any resumed threads anymore.  We achieve this with
     this scoped_disable_commit_resumed.  On construction, it will set the flag
     to false.  On destruction, it will only set it to true if there are resumed
     threads left.  */

> 
> I'm also seeing something weird with the test when run with the
> native-extended-gdbserver board, this warning:
> 
>   WARNING: Timed out waiting for EOF in server after monitor exit
> 
> I've not done a full investigation, but what seems to happen is GDB
> sends the 'monitor exit', and gdbserver does indeed try to exit.  It
> then looks like pthreads(?) is waiting on some threads?  Or some child
> processes?  At the point gdbserver is not exiting, the stack looks like:
> 
>   #0  0x00007f7009660374 in wait () from /lib64/libpthread.so.0
>   #1  0x000056146a26a858 in reap_children ()
>   #2  0x000056146a26b21c in new_job ()
>   #3  0x000056146a27740c in update_file ()
>   #4  0x000056146a277d64 in update_goal_chain ()
>   #5  0x000056146a25b893 in main ()

I believe this is a stack trace of the make process, probably the one
invoking dejagnu / expect.

> 
> Do you also see this timeout?
> 
> Additionally, I have not (yet) been able to reproduce this when running
> the test from the command line - in that case, gdbserver always exits
> immediately after getting the 'monitor exit'.

Ah, funny that you dug into that too :).  I noticed it and had a
discussion about that with Pedro.  Here's what we found.  Doing "monitor
exit" does indeed make gdbserver exit, after which (during the delay) it
is zombie (it shows up as <defunct> in ps).

We expect this "eof" expect clause to trigger:

  https://gitlab.com/gnutools/binutils-gdb/-/blob/5e219e0f46055281cfbc9351a3d27a05841be34d/gdb/testsuite/lib/gdbserver-support.exp#L476

Our theory is that the "eof" clause triggers when the pty controlling
GDBserver returns EOF, that is when nobody is connected anymore on the
slave side.  However, there is a child process that is detached during
the test that is still alive, that is still connected to the pty.  So
eof never triggers, we hang until expect gives up.

Actually, I have this workaround in the patch:

+	# Kill the detached process, otherwise that makes "monitor exit" hang
+	# until the process disappears.
+	#remote_exec target "kill $child_pid"

I commented it out to investigate the issue, and forgot to uncomment it.
I'll uncomment it.

I think it would work fine to do the "wait" right after sending the
"monitor exit", just like this:

    if { $is_mi } {
	set monitor_exit "-interpreter-exec console \"monitor exit\""
    } else {
	set monitor_exit "monitor exit"
    }
    gdb_test -nopass "$monitor_exit" "" "monitor exit"
    wait -i $server_spawn_id

We reap gdbserver even if the detach process is still alive.  Worst
case, the pty and detached process stay there until the detached process
exits by itself (which generally happens with our testsuite programs, we
try not to write infinite loops).  The problem is that there is no
timeout mechanism, so it could hang forever.  Imagine the "monitor exit"
fails for some reason, or GDBserver is stuck in a loop while trying to
exit.  That's the point of the current code.

Note that this isn't unique to my test.  I tried putting a fail in the
eof clause and ran the testsuite, I had about 20 such failures when
running the native-extended-gdbserver board.

Maybe the solution is just to make sure the tests don't leave detached
process hanging around.  We can put a fail in there and fix the
offending tests.  Any new test that leaves child processes running will
be easy to spot.

Simon

      reply	other threads:[~2022-11-19  1:16 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17 19:42 [PATCH 0/8] Fix some commit_resumed_state assertion failures (PR 28275) Simon Marchi
2022-11-17 19:42 ` [PATCH 1/8] gdb/testsuite: remove global declarations in gdb.threads/detach-step-over.exp Simon Marchi
2022-11-18  8:30   ` Aktemur, Tankut Baris
2022-11-18 15:07     ` Simon Marchi
2022-11-17 19:42 ` [PATCH 2/8] gdb/testsuite: refactor gdb.threads/detach-step-over.exp Simon Marchi
2022-11-17 19:42 ` [PATCH 3/8] gdb: fix assert when quitting GDB while a thread is stepping Simon Marchi
2022-11-17 19:42 ` [PATCH 4/8] gdbserver/linux: take condition out of callback in find_lwp_pid Simon Marchi
2022-11-18 11:28   ` Andrew Burgess
2022-11-18 16:09     ` Simon Marchi
2022-11-17 19:42 ` [PATCH 5/8] gdbserver/linux-x86: make is_64bit_tdesc accept thread as a parameter Simon Marchi
2022-11-18 11:32   ` Andrew Burgess
2022-11-18 16:12     ` Simon Marchi
2022-11-17 19:42 ` [PATCH 6/8] gdbserver: use current_process in ps_getpid Simon Marchi
2022-11-18 11:33   ` Andrew Burgess
2022-11-18 16:21     ` Simon Marchi
2022-11-17 19:42 ` [PATCH 7/8] gdbserver: switch to right process in find_one_thread Simon Marchi
2022-11-18 13:19   ` Andrew Burgess
2022-11-18 17:34     ` Simon Marchi
2022-11-17 19:42 ` [PATCH 8/8] gdb: disable commit resumed in target_kill Simon Marchi
2022-11-18 13:33   ` Andrew Burgess
2022-11-19  1:16     ` Simon Marchi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f30333a8-0958-ef5a-0444-5d3676d97eeb@simark.ca \
    --to=simark@simark.ca \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=simon.marchi@efficios.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).