From: Lancelot SIX <Lancelot.Six@amd.com>
To: Andrew Burgess <aburgess@redhat.com>,
Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org>
Cc: lsix@lancelotsix.com
Subject: Re: [PATCH v3] gdb/gcore: interrupt all threads before generating the corefile
Date: Fri, 2 Dec 2022 20:05:41 +0000 [thread overview]
Message-ID: <badbe732-aec6-6f6f-045c-fea294355663@amd.com> (raw)
In-Reply-To: <874juj2jhl.fsf@redhat.com>
Hi Andrew,
Thanks for the review. I have been through your various comments and all
seem valid points.
I'll try to send a revised patch when possible integrating all of your
comments.
Best,
Lancelot.
On 28/11/2022 15:20, Andrew Burgess wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> Lancelot SIX via Gdb-patches <gdb-patches@sourceware.org> writes:
>
>> Hi,
>>
>> This is a V3 following
>> https://sourceware.org/pipermail/gdb-patches/2022-October/192746.html.
>>
>> Noticeable changes since V2:
>> - The test have been updated following Pedro's comment to be compatible
>> with --target_board=native-extended-gdbserver.
>> - Running the entire testsuite against gdbserver revealed that the
>> original implementation was incompatible with all-stop targets. The
>> V3 improves the implementation to support both all-stop and non-stop
>> targets.
>>
>> This patch have been regression tested against gdb and gdbserver
>> (native-extended-gdbserver).
>>
>> All feedbacks are welcome.
>>
>> Best,
>> Lancelot
>>
>> ---
>>
>> In non-stop mode, if the user tries to generate a core dump (using the
>> gcore command) while some threads are running, a non-helpful error
>> message is shown.
>>
>> Lets consider the following session as an example (debugging the test
>> program included in this patch):
>>
>> (gdb) set non-stop on
>> (gdb) b 37
>> (gdb) r
>> Thread 1 "gcore-nonstop" hit Breakpoint 1, main () at gcore-nonstop.c:39
>> (gdb) info thread
>> Id Target Id Frame
>> * 1 Thread 0x7ffff7d7a740 (LWP 431838) "gcore-nonstop" main () at gcore-nonstop.c:39
>> 2 Thread 0x7ffff7d79640 (LWP 431841) "gcore-nonstop" (running)
>> (gdb) gcore
>> Couldn't get registers: No such process.
>>
>> The reported error ("No such process") does not help the user understand
>> what happens. This is due to the fact that we cannot access the
>> registers of a running thread. Even if we ignore this error, generating
>> a core dump while any thread might update memory would most likely
>> result in a core file with an inconsistent view of the process' memory.
>>
>> To solve this, this patch proposes to change the gcore command so it
>> first stops all running threads (from the current inferior) before
>> generating the corefile, and then resumes them in their previous state.
>>
>> To achieve this, this patch exposes the restart_threads function in infrun.h
>> (used to be local to infrun.c). We also allow the first parameter
>> (event_thread) to be nullptr as it is possible that the gcore command is
>> called while all threads are running, in which case we want all threads
>> to be restarted at the end of the procedure.
>>
>> When testing this patch against gdbserver, it appears that using
>> stop_all_threads / restart_threads was not compatible with all-stop
>> targets. For those targets, we need to call target_stop_and_wait /
>> target_resume. The gcore_command has code to handle both
>> configurations. I could not use a RAII-like object to have a cleaner
>> way to restore the state at the end as the restore procedure could
>> throw. Instead, the procedure keeps track of which method was used to
>> interrupt threads so the appropriate method can be used to restore their
>> state.
>>
>> Tested on x86_64 on navite GDB and the native-extended-gdbserver board.
>> ---
>> gdb/NEWS | 5 ++
>> gdb/doc/gdb.texinfo | 5 ++
>> gdb/gcore.c | 37 ++++++++++++
>> gdb/infrun.c | 16 ++---
>> gdb/infrun.h | 9 +++
>> gdb/testsuite/gdb.base/gcore-nonstop.c | 44 ++++++++++++++
>> gdb/testsuite/gdb.base/gcore-nonstop.exp | 77 ++++++++++++++++++++++++
>> 7 files changed, 182 insertions(+), 11 deletions(-)
>> create mode 100644 gdb/testsuite/gdb.base/gcore-nonstop.c
>> create mode 100644 gdb/testsuite/gdb.base/gcore-nonstop.exp
>>
>> diff --git a/gdb/NEWS b/gdb/NEWS
>> index 3f31515297c..adc0963aecb 100644
>> --- a/gdb/NEWS
>> +++ b/gdb/NEWS
>> @@ -128,6 +128,11 @@ set style tui-current-position [on|off]
>>
>> * Changed commands
>>
>> +gcore
>
> I think you should list both gcore and generate-core-file here.
>
>> + GDB now ensures that all threads of the current inferior are stopped
>> + before generating a core dump. At the end of the command, threads are
>> + restored to their previous state.
>> +
>> document user-defined
>> It is now possible to document user-defined aliases.
>> When a user-defined alias is documented, the help and apropos commands
>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
>> index f5f664fd168..ec2eba4b410 100644
>> --- a/gdb/doc/gdb.texinfo
>> +++ b/gdb/doc/gdb.texinfo
>> @@ -13570,6 +13570,11 @@ Produce a core dump of the inferior process. The optional argument
>> specified, the file name defaults to @file{core.@var{pid}}, where
>> @var{pid} is the inferior process ID.
>>
>> +@value{GDBN} ensures that all threads of the current inferior are stopped
>> +while generating the core dump. If any of the inferior's thread is found
>> +running when executing this command, @value{GDBN} stops it and resumes it
>> +when the command is done.
>
> I think the second sentence needs rewording. I would propose:
>
> If any of the inferior's threads are running when executing this
> command, @value{GDBN} stops the threads and resumes them when the
> command is done.
>
>> +
>> Note that this command is implemented only for some systems (as of
>> this writing, @sc{gnu}/Linux, FreeBSD, Solaris, and S390).
>>
>> diff --git a/gdb/gcore.c b/gdb/gcore.c
>> index ede78534bd8..dda9e7af50b 100644
>> --- a/gdb/gcore.c
>> +++ b/gdb/gcore.c
>> @@ -34,6 +34,7 @@
>> #include "regset.h"
>> #include "gdb_bfd.h"
>> #include "readline/tilde.h"
>> +#include "infrun.h"
>> #include <algorithm>
>> #include "gdbsupport/gdb_unlinker.h"
>> #include "gdbsupport/byte-vector.h"
>> @@ -131,6 +132,35 @@ gcore_command (const char *args, int from_tty)
>> if (!target_has_execution ())
>> noprocess ();
>>
>> + scoped_restore_current_thread restore_current_thread;
>> + scoped_disable_commit_resumed disable_commit_resume ("generating coredump");
>> + struct inferior *inf = current_inferior ();
>> + scoped_finish_thread_state finish_state (inf->process_target (),
>> + ptid_t (inferior_ptid.pid ()));
>
> Through all of the following code you work with INF. I guess you maybe
> did this to reduce the line lengths? But it also has the nice effect
> that the following code is mostly independent of the global
> current_inferior() state.
>
> However, I found it a little confusing that when building the ptid_t you
> still make use of the global inferior_ptid.
>
> I think I'm correct to say that all of the 'inferior_ptid.pid ()' calls
> could be replaced with 'inf->pid', which I think is clearer.
>
> Certainly, I tried making this change, and the test still passed. Is
> there any reason why this wouldn't be a valid change?
>
>> + bool all_stop_was_running = false;
>> + if (exists_non_stop_target ())
>> + stop_all_threads ("generating coredump", inf);
>> + else
>> + {
>> + for (thread_info *t
>> + : all_non_exited_threads (inf->process_target (),
>
> I don't think there's any reason to wrap this line, unwrapped it's still
> only 74 characters.
>
> Though I think you could just use 'inf->non_exited_thread ()' instead.
>
>> + ptid_t (inferior_ptid.pid ())))
>> + {
>> + /* All threads are executing or none is, no need to go through the
>> + entire list. */
>> + all_stop_was_running = t->executing ();
>> + break;
>> + }
>
> Actually, I think you could replace this whole loop with:
>
> const thread_info *thr = any_thread_of_inferior (inf);
> all_stop_was_running = thr->executing ();
>
>> +
>> + if (all_stop_was_running)
>> + {
>> + if (!may_stop)
>> + error (_("Cannot stop the target to generate the core dump."));
>> +
>> + target_stop_and_wait (ptid_t (inferior_ptid.pid ()));
>> + }
>> + }
>> +
>> if (args && *args)
>> corefilename.reset (tilde_expand (args));
>> else
>> @@ -161,6 +191,13 @@ gcore_command (const char *args, int from_tty)
>> }
>>
>> gdb_printf ("Saved corefile %s\n", corefilename.get ());
>> +
>> + if (exists_non_stop_target ())
>> + restart_threads (nullptr, inf);
>> + else if (all_stop_was_running)
>> + target_resume (ptid_t (inferior_ptid.pid ()), 0, GDB_SIGNAL_0);
>> +
>> + disable_commit_resume.reset_and_commit ();
>> }
>>
>> static enum bfd_architecture
>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>> index 6da46b75ac7..c7c8ad5456f 100644
>> --- a/gdb/infrun.c
>> +++ b/gdb/infrun.c
>> @@ -96,9 +96,6 @@ static void resume (gdb_signal sig);
>>
>> static void wait_for_inferior (inferior *inf);
>>
>> -static void restart_threads (struct thread_info *event_thread,
>> - inferior *inf = nullptr);
>> -
>> static bool start_step_over (void);
>>
>> static bool step_over_info_valid_p (void);
>> @@ -5886,18 +5883,15 @@ handle_inferior_event (struct execution_control_state *ecs)
>> }
>> }
>>
>> -/* Restart threads back to what they were trying to do back when we
>> - paused them (because of an in-line step-over or vfork, for example).
>> - The EVENT_THREAD thread is ignored (not restarted).
>> -
>> - If INF is non-nullptr, only resume threads from INF. */
>> +/* See infrun.h. */
>>
>> -static void
>> +void
>> restart_threads (struct thread_info *event_thread, inferior *inf)
>> {
>> INFRUN_SCOPED_DEBUG_START_END ("event_thread=%s, inf=%d",
>> - event_thread->ptid.to_string ().c_str (),
>> - inf != nullptr ? inf->num : -1);
>> + (event_thread != nullptr
>> + ? event_thread->ptid.to_string ().c_str ()
>> + : "None"), inf != nullptr ? inf->num : -1);
>>
>> gdb_assert (!step_over_info_valid_p ());
>>
>> diff --git a/gdb/infrun.h b/gdb/infrun.h
>> index c711b9b21cc..4cd98ec06c5 100644
>> --- a/gdb/infrun.h
>> +++ b/gdb/infrun.h
>> @@ -175,6 +175,15 @@ extern void nullify_last_target_wait_ptid ();
>> all threads of all inferiors. */
>> extern void stop_all_threads (const char *reason, inferior *inf = nullptr);
>>
>> +/* Restart threads back to what they were trying to do back when we
>> + paused them (because of an in-line step-over or vfork, for example).
>> + The EVENT_THREAD thread, if non-nullptr, is ignored (not restarted).
>> +
>> + If INF is non-nullptr, only resume threads from INF. */
>> +
>> +extern void restart_threads (struct thread_info *event_thread,
>> + inferior *inf = nullptr);
>> +
>> extern void prepare_for_detach (void);
>>
>> extern void fetch_inferior_event ();
>> diff --git a/gdb/testsuite/gdb.base/gcore-nonstop.c b/gdb/testsuite/gdb.base/gcore-nonstop.c
>> new file mode 100644
>> index 00000000000..191a1a26849
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.base/gcore-nonstop.c
>> @@ -0,0 +1,44 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> + Copyright 2022 Free Software Foundation, Inc.
>> +
>> + This program is free software; you can redistribute it and/or modify
>> + it under the terms of the GNU General Public License as published by
>> + the Free Software Foundation; either version 3 of the License, or
>> + (at your option) any later version.
>> +
>> + This program is distributed in the hope that it will be useful,
>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + GNU General Public License for more details.
>> +
>> + You should have received a copy of the GNU General Public License
>> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
>> +
>> +#include <pthread.h>
>> +
>> +static pthread_barrier_t barrier;
>> +
>> +static void *
>> +worker_func (void *)
>
> For me, gcc 9.3.1, this test doesn't compile due to the missing argument
> name here. Changing to
>
> worker_func (void *ignored)
>
> works just fine. Could we do that? If you really wanted you could
> annotate with '__attribute__ ((unused))' though this doesn't seem to be
> widespread in our testsuite, so personally, I'd not bother, but it's up
> to you.
>
>> +{
>> + pthread_barrier_wait (&barrier);
>> + return NULL;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> + pthread_t worker_thread;
>> + pthread_barrier_init (&barrier, NULL, 2);
>> +
>> + pthread_create (&worker_thread, NULL, worker_func, NULL);
>> +
>> + /* Break here. */
>> +
>> + pthread_barrier_wait (&barrier);
>> + pthread_join (worker_thread, NULL);
>> + pthread_barrier_destroy (&barrier);
>> +
>> + return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.base/gcore-nonstop.exp b/gdb/testsuite/gdb.base/gcore-nonstop.exp
>> new file mode 100644
>> index 00000000000..56fc909b00b
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.base/gcore-nonstop.exp
>> @@ -0,0 +1,77 @@
>> +# Copyright 2022 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# This testcase checks that when in non-stop mode with some threads running
>> +# the gcore command can interrupt all threads, generate a core dump and
>> +# restart threads as required.
>> +
>> +standard_testfile
>> +
>> +if { [prepare_for_testing "failed to prepare" \
>> + ${testfile} ${srcfile} {threads debug}] } {
>
> I think: s/threads/pthreads/ otherwise, for me, this test doesn't link
> with -lpthread, and fails to link.
>
> Thanks,
> Andrew
>
>> + return
>> +}
>> +
>> +save_vars { GDBFLAGS } {
>> + append GDBFLAGS " -ex \"set non-stop on\""
>> + clean_restart ${binfile}
>> +}
>> +
>> +set lineno [gdb_get_line_number "Break here"]
>> +if { ![runto $lineno] } {
>> + return
>> +}
>> +
>> +# We should be stopped in thread 1 while thread 2 is running.
>> +gdb_test_sequence "info threads" "info threads" {
>> + {Id\s+Target Id\s+Frame}
>> + {\*\s+1[^\n]*\n}
>> + {\s+2\s+[^\n]*\(running\)[^\n]*\n}
>> +}
>> +
>> +set th1_pc ""
>> +gdb_test_multiple "p/x \$pc" "fetch thread 1 PC" {
>> + -wrap -re "$::decimal = ($::hex)" {
>> + set th1_pc $expect_out(1,string)
>> + pass $gdb_test_name
>> + }
>> +}
>> +
>> +set corefile [standard_output_file "corefile"]
>> +if {![gdb_gcore_cmd $corefile "generate corefile"]} {
>> + # gdb_gcore_cmd issues a "UNSUPPORTED".
>> + return
>> +}
>> +
>> +# After the core file is generated, thread 2 should be back running
>> +# and thread 1 should still be selected.
>> +gdb_test_sequence "info threads" "correct thread selection after gcore" {
>> + {Id\s+Target Id\s+Frame}
>> + {\*\s+1[^\n]*\n}
>> + {\s+2\s+[^\n]*\(running\)[^\n]*\n}
>> +}
>> +
>> +# Thread 1 is at the same stop it was before calling the gcore command.
>> +gdb_test "p/x \$pc" "\\\$$::decimal = $th1_pc" "thread 1 unchanged"
>> +
>> +clean_restart $binfile
>> +gdb_test "core-file $corefile" "Core was generated by.*" "load corefile"
>> +
>> +# The core file has the 2 threads.
>> +gdb_test_sequence "info threads" "threads in corefile" {
>> + {Id\s+Target Id\s+Frame}
>> + {\s+1\s+Thread[^\n]*\n}
>> + {\s+2\s+Thread[^\n]*\n}
>> +}
>>
>> --
>> 2.34.1
>
prev parent reply other threads:[~2022-12-02 20:05 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-16 12:36 Lancelot SIX
2022-11-28 10:46 ` [PING] " Lancelot SIX
2022-11-28 15:20 ` Andrew Burgess
2022-12-02 20:05 ` Lancelot SIX [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=badbe732-aec6-6f6f-045c-fea294355663@amd.com \
--to=lancelot.six@amd.com \
--cc=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=lsix@lancelotsix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).