From: Andrew Burgess <aburgess@redhat.com>
To: Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org>,
gdb-patches@sourceware.org
Cc: lsix@lancelotsix.com, Lancelot SIX <lancelot.six@amd.com>
Subject: Re: [PATCH v3] gdb/gcore: interrupt all threads before generating the corefile
Date: Mon, 28 Nov 2022 15:20:54 +0000 [thread overview]
Message-ID: <874juj2jhl.fsf@redhat.com> (raw)
In-Reply-To: <20221116123649.2430701-1-lancelot.six@amd.com>
Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org> writes:
> Hi,
>
> This is a V3 following
> https://sourceware.org/pipermail/gdb-patches/2022-October/192746.html.
>
> Noticeable changes since V2:
> - The test have been updated following Pedro's comment to be compatible
> with --target_board=native-extended-gdbserver.
> - Running the entire testsuite against gdbserver revealed that the
> original implementation was incompatible with all-stop targets. The
> V3 improves the implementation to support both all-stop and non-stop
> targets.
>
> This patch have been regression tested against gdb and gdbserver
> (native-extended-gdbserver).
>
> All feedbacks are welcome.
>
> Best,
> Lancelot
>
> ---
>
> In non-stop mode, if the user tries to generate a core dump (using the
> gcore command) while some threads are running, a non-helpful error
> message is shown.
>
> Lets consider the following session as an example (debugging the test
> program included in this patch):
>
> (gdb) set non-stop on
> (gdb) b 37
> (gdb) r
> Thread 1 "gcore-nonstop" hit Breakpoint 1, main () at gcore-nonstop.c:39
> (gdb) info thread
> Id Target Id Frame
> * 1 Thread 0x7ffff7d7a740 (LWP 431838) "gcore-nonstop" main () at gcore-nonstop.c:39
> 2 Thread 0x7ffff7d79640 (LWP 431841) "gcore-nonstop" (running)
> (gdb) gcore
> Couldn't get registers: No such process.
>
> The reported error ("No such process") does not help the user understand
> what happens. This is due to the fact that we cannot access the
> registers of a running thread. Even if we ignore this error, generating
> a core dump while any thread might update memory would most likely
> result in a core file with an inconsistent view of the process' memory.
>
> To solve this, this patch proposes to change the gcore command so it
> first stops all running threads (from the current inferior) before
> generating the corefile, and then resumes them in their previous state.
>
> To achieve this, this patch exposes the restart_threads function in infrun.h
> (used to be local to infrun.c). We also allow the first parameter
> (event_thread) to be nullptr as it is possible that the gcore command is
> called while all threads are running, in which case we want all threads
> to be restarted at the end of the procedure.
>
> When testing this patch against gdbserver, it appears that using
> stop_all_threads / restart_threads was not compatible with all-stop
> targets. For those targets, we need to call target_stop_and_wait /
> target_resume. The gcore_command has code to handle both
> configurations. I could not use a RAII-like object to have a cleaner
> way to restore the state at the end as the restore procedure could
> throw. Instead, the procedure keeps track of which method was used to
> interrupt threads so the appropriate method can be used to restore their
> state.
>
> Tested on x86_64 on navite GDB and the native-extended-gdbserver board.
> ---
> gdb/NEWS | 5 ++
> gdb/doc/gdb.texinfo | 5 ++
> gdb/gcore.c | 37 ++++++++++++
> gdb/infrun.c | 16 ++---
> gdb/infrun.h | 9 +++
> gdb/testsuite/gdb.base/gcore-nonstop.c | 44 ++++++++++++++
> gdb/testsuite/gdb.base/gcore-nonstop.exp | 77 ++++++++++++++++++++++++
> 7 files changed, 182 insertions(+), 11 deletions(-)
> create mode 100644 gdb/testsuite/gdb.base/gcore-nonstop.c
> create mode 100644 gdb/testsuite/gdb.base/gcore-nonstop.exp
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index 3f31515297c..adc0963aecb 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -128,6 +128,11 @@ set style tui-current-position [on|off]
>
> * Changed commands
>
> +gcore
I think you should list both gcore and generate-core-file here.
> + GDB now ensures that all threads of the current inferior are stopped
> + before generating a core dump. At the end of the command, threads are
> + restored to their previous state.
> +
> document user-defined
> It is now possible to document user-defined aliases.
> When a user-defined alias is documented, the help and apropos commands
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index f5f664fd168..ec2eba4b410 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -13570,6 +13570,11 @@ Produce a core dump of the inferior process. The optional argument
> specified, the file name defaults to @file{core.@var{pid}}, where
> @var{pid} is the inferior process ID.
>
> +@value{GDBN} ensures that all threads of the current inferior are stopped
> +while generating the core dump. If any of the inferior's thread is found
> +running when executing this command, @value{GDBN} stops it and resumes it
> +when the command is done.
I think the second sentence needs rewording. I would propose:
If any of the inferior's threads are running when executing this
command, @value{GDBN} stops the threads and resumes them when the
command is done.
> +
> Note that this command is implemented only for some systems (as of
> this writing, @sc{gnu}/Linux, FreeBSD, Solaris, and S390).
>
> diff --git a/gdb/gcore.c b/gdb/gcore.c
> index ede78534bd8..dda9e7af50b 100644
> --- a/gdb/gcore.c
> +++ b/gdb/gcore.c
> @@ -34,6 +34,7 @@
> #include "regset.h"
> #include "gdb_bfd.h"
> #include "readline/tilde.h"
> +#include "infrun.h"
> #include <algorithm>
> #include "gdbsupport/gdb_unlinker.h"
> #include "gdbsupport/byte-vector.h"
> @@ -131,6 +132,35 @@ gcore_command (const char *args, int from_tty)
> if (!target_has_execution ())
> noprocess ();
>
> + scoped_restore_current_thread restore_current_thread;
> + scoped_disable_commit_resumed disable_commit_resume ("generating coredump");
> + struct inferior *inf = current_inferior ();
> + scoped_finish_thread_state finish_state (inf->process_target (),
> + ptid_t (inferior_ptid.pid ()));
Through all of the following code you work with INF. I guess you maybe
did this to reduce the line lengths? But it also has the nice effect
that the following code is mostly independent of the global
current_inferior() state.
However, I found it a little confusing that when building the ptid_t you
still make use of the global inferior_ptid.
I think I'm correct to say that all of the 'inferior_ptid.pid ()' calls
could be replaced with 'inf->pid', which I think is clearer.
Certainly, I tried making this change, and the test still passed. Is
there any reason why this wouldn't be a valid change?
> + bool all_stop_was_running = false;
> + if (exists_non_stop_target ())
> + stop_all_threads ("generating coredump", inf);
> + else
> + {
> + for (thread_info *t
> + : all_non_exited_threads (inf->process_target (),
I don't think there's any reason to wrap this line, unwrapped it's still
only 74 characters.
Though I think you could just use 'inf->non_exited_thread ()' instead.
> + ptid_t (inferior_ptid.pid ())))
> + {
> + /* All threads are executing or none is, no need to go through the
> + entire list. */
> + all_stop_was_running = t->executing ();
> + break;
> + }
Actually, I think you could replace this whole loop with:
const thread_info *thr = any_thread_of_inferior (inf);
all_stop_was_running = thr->executing ();
> +
> + if (all_stop_was_running)
> + {
> + if (!may_stop)
> + error (_("Cannot stop the target to generate the core dump."));
> +
> + target_stop_and_wait (ptid_t (inferior_ptid.pid ()));
> + }
> + }
> +
> if (args && *args)
> corefilename.reset (tilde_expand (args));
> else
> @@ -161,6 +191,13 @@ gcore_command (const char *args, int from_tty)
> }
>
> gdb_printf ("Saved corefile %s\n", corefilename.get ());
> +
> + if (exists_non_stop_target ())
> + restart_threads (nullptr, inf);
> + else if (all_stop_was_running)
> + target_resume (ptid_t (inferior_ptid.pid ()), 0, GDB_SIGNAL_0);
> +
> + disable_commit_resume.reset_and_commit ();
> }
>
> static enum bfd_architecture
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 6da46b75ac7..c7c8ad5456f 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -96,9 +96,6 @@ static void resume (gdb_signal sig);
>
> static void wait_for_inferior (inferior *inf);
>
> -static void restart_threads (struct thread_info *event_thread,
> - inferior *inf = nullptr);
> -
> static bool start_step_over (void);
>
> static bool step_over_info_valid_p (void);
> @@ -5886,18 +5883,15 @@ handle_inferior_event (struct execution_control_state *ecs)
> }
> }
>
> -/* Restart threads back to what they were trying to do back when we
> - paused them (because of an in-line step-over or vfork, for example).
> - The EVENT_THREAD thread is ignored (not restarted).
> -
> - If INF is non-nullptr, only resume threads from INF. */
> +/* See infrun.h. */
>
> -static void
> +void
> restart_threads (struct thread_info *event_thread, inferior *inf)
> {
> INFRUN_SCOPED_DEBUG_START_END ("event_thread=%s, inf=%d",
> - event_thread->ptid.to_string ().c_str (),
> - inf != nullptr ? inf->num : -1);
> + (event_thread != nullptr
> + ? event_thread->ptid.to_string ().c_str ()
> + : "None"), inf != nullptr ? inf->num : -1);
>
> gdb_assert (!step_over_info_valid_p ());
>
> diff --git a/gdb/infrun.h b/gdb/infrun.h
> index c711b9b21cc..4cd98ec06c5 100644
> --- a/gdb/infrun.h
> +++ b/gdb/infrun.h
> @@ -175,6 +175,15 @@ extern void nullify_last_target_wait_ptid ();
> all threads of all inferiors. */
> extern void stop_all_threads (const char *reason, inferior *inf = nullptr);
>
> +/* Restart threads back to what they were trying to do back when we
> + paused them (because of an in-line step-over or vfork, for example).
> + The EVENT_THREAD thread, if non-nullptr, is ignored (not restarted).
> +
> + If INF is non-nullptr, only resume threads from INF. */
> +
> +extern void restart_threads (struct thread_info *event_thread,
> + inferior *inf = nullptr);
> +
> extern void prepare_for_detach (void);
>
> extern void fetch_inferior_event ();
> diff --git a/gdb/testsuite/gdb.base/gcore-nonstop.c b/gdb/testsuite/gdb.base/gcore-nonstop.c
> new file mode 100644
> index 00000000000..191a1a26849
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/gcore-nonstop.c
> @@ -0,0 +1,44 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> + Copyright 2022 Free Software Foundation, Inc.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + This program is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +#include <pthread.h>
> +
> +static pthread_barrier_t barrier;
> +
> +static void *
> +worker_func (void *)
For me, gcc 9.3.1, this test doesn't compile due to the missing argument
name here. Changing to
worker_func (void *ignored)
works just fine. Could we do that? If you really wanted you could
annotate with '__attribute__ ((unused))' though this doesn't seem to be
widespread in our testsuite, so personally, I'd not bother, but it's up
to you.
> +{
> + pthread_barrier_wait (&barrier);
> + return NULL;
> +}
> +
> +int
> +main (void)
> +{
> + pthread_t worker_thread;
> + pthread_barrier_init (&barrier, NULL, 2);
> +
> + pthread_create (&worker_thread, NULL, worker_func, NULL);
> +
> + /* Break here. */
> +
> + pthread_barrier_wait (&barrier);
> + pthread_join (worker_thread, NULL);
> + pthread_barrier_destroy (&barrier);
> +
> + return 0;
> +}
> diff --git a/gdb/testsuite/gdb.base/gcore-nonstop.exp b/gdb/testsuite/gdb.base/gcore-nonstop.exp
> new file mode 100644
> index 00000000000..56fc909b00b
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/gcore-nonstop.exp
> @@ -0,0 +1,77 @@
> +# Copyright 2022 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +
> +# This testcase checks that when in non-stop mode with some threads running
> +# the gcore command can interrupt all threads, generate a core dump and
> +# restart threads as required.
> +
> +standard_testfile
> +
> +if { [prepare_for_testing "failed to prepare" \
> + ${testfile} ${srcfile} {threads debug}] } {
I think: s/threads/pthreads/ otherwise, for me, this test doesn't link
with -lpthread, and fails to link.
Thanks,
Andrew
> + return
> +}
> +
> +save_vars { GDBFLAGS } {
> + append GDBFLAGS " -ex \"set non-stop on\""
> + clean_restart ${binfile}
> +}
> +
> +set lineno [gdb_get_line_number "Break here"]
> +if { ![runto $lineno] } {
> + return
> +}
> +
> +# We should be stopped in thread 1 while thread 2 is running.
> +gdb_test_sequence "info threads" "info threads" {
> + {Id\s+Target Id\s+Frame}
> + {\*\s+1[^\n]*\n}
> + {\s+2\s+[^\n]*\(running\)[^\n]*\n}
> +}
> +
> +set th1_pc ""
> +gdb_test_multiple "p/x \$pc" "fetch thread 1 PC" {
> + -wrap -re "$::decimal = ($::hex)" {
> + set th1_pc $expect_out(1,string)
> + pass $gdb_test_name
> + }
> +}
> +
> +set corefile [standard_output_file "corefile"]
> +if {![gdb_gcore_cmd $corefile "generate corefile"]} {
> + # gdb_gcore_cmd issues a "UNSUPPORTED".
> + return
> +}
> +
> +# After the core file is generated, thread 2 should be back running
> +# and thread 1 should still be selected.
> +gdb_test_sequence "info threads" "correct thread selection after gcore" {
> + {Id\s+Target Id\s+Frame}
> + {\*\s+1[^\n]*\n}
> + {\s+2\s+[^\n]*\(running\)[^\n]*\n}
> +}
> +
> +# Thread 1 is at the same stop it was before calling the gcore command.
> +gdb_test "p/x \$pc" "\\\$$::decimal = $th1_pc" "thread 1 unchanged"
> +
> +clean_restart $binfile
> +gdb_test "core-file $corefile" "Core was generated by.*" "load corefile"
> +
> +# The core file has the 2 threads.
> +gdb_test_sequence "info threads" "threads in corefile" {
> + {Id\s+Target Id\s+Frame}
> + {\s+1\s+Thread[^\n]*\n}
> + {\s+2\s+Thread[^\n]*\n}
> +}
>
> --
> 2.34.1
next prev parent reply other threads:[~2022-11-28 15:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-16 12:36 Lancelot SIX
2022-11-28 10:46 ` [PING] " Lancelot SIX
2022-11-28 15:20 ` Andrew Burgess [this message]
2022-12-02 20:05 ` Lancelot SIX
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874juj2jhl.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=lancelot.six@amd.com \
--cc=lsix@lancelotsix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).