From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by sourceware.org (Postfix) with ESMTPS id E97B93858D28 for ; Tue, 11 Oct 2022 18:44:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E97B93858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-f45.google.com with SMTP id iv17so9160104wmb.4 for ; Tue, 11 Oct 2022 11:44:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ti/82zP/Vdki9PqYhZCObqFL4tuwqRiOy69OPvmrtTw=; b=6c/ISP5/ISZe5XLyu3lJpk1dlQpXep5z6Mg5LO/MwJIoppWMIflpJHYSiSxnrq92Af VEFS2a8nT3KeMjqNbhn62rDHgl4/v+H5JInwrg+5VETED249Ktlv70NcmAuNVDH4QuNi 3FZYOEu4x9ys7Kl9QWq0sBriPVUeERshey+DDg1/0agsCA7m352157dqQ3mUfTHlc9kf KrPb9OYBGkOr/6a+27Tga7lOztL41LeXdpSIm3MKCM24Js2jZ6+E81sDfgX4p5uC7Os6 2kNyrJ094WLAbkWADXALPOKmB2g6ScWMuf6Fnj5tXsV8VmhIRlxlV+hlsi323rV1Rrb1 BwSg== X-Gm-Message-State: ACrzQf3Y2wGMOAbdAti3/AUWV0IRlYCwUfPh6M8lI13kts5DuhOr2X8O bXwKSfNe3e9vieK0GASegQU= X-Google-Smtp-Source: AMsMyM7x7d53DAblSKVgduLX+aQypuGdESaslUjgmt+yibsRU/soZ5EX+t1Y2DiHjCZb930PHrZQZg== X-Received: by 2002:a05:600c:4999:b0:3c4:df99:5596 with SMTP id h25-20020a05600c499900b003c4df995596mr327464wmp.154.1665513898553; Tue, 11 Oct 2022 11:44:58 -0700 (PDT) Received: from ?IPv6:2001:8a0:f93a:3b00:e038:5cdc:b8bf:4653? ([2001:8a0:f93a:3b00:e038:5cdc:b8bf:4653]) by smtp.gmail.com with ESMTPSA id co11-20020a0560000a0b00b00228d52b935asm12297234wrb.71.2022.10.11.11.44.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Oct 2022 11:44:57 -0700 (PDT) Subject: Re: [PATCH] gdb/gcore: interrupt all threads before generating the corefile To: Lancelot SIX , gdb-patches@sourceware.org Cc: lsix@lancelotsix.com References: <20221006095035.2857747-1-lancelot.six@amd.com> From: Pedro Alves Message-ID: <556673bd-79ea-2216-bc94-a41e862b888b@palves.net> Date: Tue, 11 Oct 2022 19:44:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20221006095035.2857747-1-lancelot.six@amd.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Oct 2022 18:45:05 -0000 On 2022-10-06 10:50 a.m., Lancelot SIX via Gdb-patches wrote: > In non-stop mode, if the user tries to generate a core dump (using the > gcore command) while some threads are running, a non-helpful error > message is shown. > > Lets consider the following session as an example (debugging the test > program included in this patch): > > (gdb) set non-stop on > (gdb) b 37 > (gdb) r > Thread 1 "gcore-nonstop" hit Breakpoint 1, main () at gcore-nonstop.c:39 > (gdb) info thread > Id Target Id Frame > * 1 Thread 0x7ffff7d7a740 (LWP 431838) "gcore-nonstop" main () * at gcore-nonstop.c:39 Weird " * " after "main ()". > 2 Thread 0x7ffff7d79640 (LWP 431841) "gcore-nonstop" (running) > (gdb) gcore > Couldn't get registers: No such process. > > The reported error ("No such process") does not help the user understand > what happens. This is due to the fact that we cannot access the > registers of a running thread. Even if we ignore this error, generating > a core dump while any thread might update memory would most likely > result in a core file with an inconsistent view of the process' memory. > > To solve this, this patch proposes to change the gcore command so it > first stops all running threads before generating the corefile, and then > resumes them in their previous state. > > The patch proposes to stop all threads across all the inferiors (not > just the current one) just in case the memory space is shared between > inferiors. The memory space may be shared with processes we're not debugging, too, though. Seems odd to stop threads running on other targets, if we can easily help it. E.g., you have 10 inferiors loaded, all running on different remote targets. Stopping all inferiors means we stop the inferiors running on all those different remote targets. > > To achieve this, this patch exposes the restart_threads function in infrun.h > (used to be local to infrun.c). We also allow the first parameter > (event_thread) to be nullptr as it is possible that the gcore command is > called while all threads are running, in which case we want all threads > to be restarted at the end of the procedure. > > Tested on x86_64. > --- > gdb/gcore.c | 8 +++ > gdb/infrun.c | 16 ++---- > gdb/infrun.h | 9 ++++ > gdb/testsuite/gdb.base/gcore-nonstop.c | 44 +++++++++++++++++ > gdb/testsuite/gdb.base/gcore-nonstop.exp | 62 ++++++++++++++++++++++++ > 5 files changed, 128 insertions(+), 11 deletions(-) > create mode 100644 gdb/testsuite/gdb.base/gcore-nonstop.c > create mode 100644 gdb/testsuite/gdb.base/gcore-nonstop.exp > > diff --git a/gdb/gcore.c b/gdb/gcore.c > index 519007714e5..664318b4161 100644 > --- a/gdb/gcore.c > +++ b/gdb/gcore.c > @@ -34,6 +34,7 @@ > #include "regset.h" > #include "gdb_bfd.h" > #include "readline/tilde.h" > +#include "infrun.h" > #include > #include "gdbsupport/gdb_unlinker.h" > #include "gdbsupport/byte-vector.h" > @@ -131,6 +132,10 @@ gcore_command (const char *args, int from_tty) > if (!target_has_execution ()) > noprocess (); > > + scoped_restore_current_thread restore_current_thread; > + scoped_disable_commit_resumed disable_commit_resume ("generating coredump"); > + stop_all_threads ("generating coredump"); > + > if (args && *args) > corefilename.reset (tilde_expand (args)); > else > @@ -161,6 +166,9 @@ gcore_command (const char *args, int from_tty) > } > > gdb_printf ("Saved corefile %s\n", corefilename.get ()); So if something goes wrong dumping core, and we throw an error, we end up with all threads internally stopped, while "info threads" will show all threads as "running". That could be fixed with a scoped_finish_thread_state. (Alternatively we could instead just error out if a thread is running.) If we go with the auto-stop, then this should be documented in the manual, and get a NEWS entry, IMHO. > + > + restart_threads (nullptr, nullptr); > + disable_commit_resume.reset_and_commit (); > } > > static enum bfd_architecture > diff --git a/gdb/infrun.c b/gdb/infrun.c > index 1957e8020dd..34fcb2f92dd 100644 > --- a/gdb/infrun.c > +++ b/gdb/infrun.c > @@ -96,9 +96,6 @@ static void resume (gdb_signal sig); > > static void wait_for_inferior (inferior *inf); > > -static void restart_threads (struct thread_info *event_thread, > - inferior *inf = nullptr); > - > static bool start_step_over (void); > > static bool step_over_info_valid_p (void); > @@ -5889,18 +5886,15 @@ handle_inferior_event (struct execution_control_state *ecs) > } > } > > -/* Restart threads back to what they were trying to do back when we > - paused them (because of an in-line step-over or vfork, for example). > - The EVENT_THREAD thread is ignored (not restarted). > - > - If INF is non-nullptr, only resume threads from INF. */ > +/* See infrun.h. */ > > -static void > +void > restart_threads (struct thread_info *event_thread, inferior *inf) > { > INFRUN_SCOPED_DEBUG_START_END ("event_thread=%s, inf=%d", > - event_thread->ptid.to_string ().c_str (), > - inf != nullptr ? inf->num : -1); > + (event_thread != nullptr > + ? event_thread->ptid.to_string ().c_str () > + : "None"), inf != nullptr ? inf->num : -1); > > gdb_assert (!step_over_info_valid_p ()); > > diff --git a/gdb/infrun.h b/gdb/infrun.h > index 0c7c55eabec..81d00f6da7e 100644 > --- a/gdb/infrun.h > +++ b/gdb/infrun.h > @@ -173,6 +173,15 @@ extern void nullify_last_target_wait_ptid (); > all threads of all inferiors. */ > extern void stop_all_threads (const char *reason, inferior *inf = nullptr); > > +/* Restart threads back to what they were trying to do back when we > + paused them (because of an in-line step-over or vfork, for example). > + The EVENT_THREAD thread, if non-nullptr, is ignored (not restarted). > + > + If INF is non-nullptr, only resume threads from INF. */ > + > +extern void restart_threads (struct thread_info *event_thread, > + inferior *inf = nullptr); > + > extern void prepare_for_detach (void); > > extern void fetch_inferior_event (); > diff --git a/gdb/testsuite/gdb.base/gcore-nonstop.c b/gdb/testsuite/gdb.base/gcore-nonstop.c > new file mode 100644 > index 00000000000..191a1a26849 > --- /dev/null > +++ b/gdb/testsuite/gdb.base/gcore-nonstop.c > @@ -0,0 +1,44 @@ > +/* This testcase is part of GDB, the GNU debugger. > + > + Copyright 2022 Free Software Foundation, Inc. > + > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see . */ > + > +#include > + > +static pthread_barrier_t barrier; > + > +static void * > +worker_func (void *) > +{ > + pthread_barrier_wait (&barrier); > + return NULL; > +} > + > +int > +main (void) > +{ > + pthread_t worker_thread; > + pthread_barrier_init (&barrier, NULL, 2); > + > + pthread_create (&worker_thread, NULL, worker_func, NULL); > + > + /* Break here. */ > + > + pthread_barrier_wait (&barrier); > + pthread_join (worker_thread, NULL); > + pthread_barrier_destroy (&barrier); > + > + return 0; > +} > diff --git a/gdb/testsuite/gdb.base/gcore-nonstop.exp b/gdb/testsuite/gdb.base/gcore-nonstop.exp > new file mode 100644 > index 00000000000..6c9ed4ad342 > --- /dev/null > +++ b/gdb/testsuite/gdb.base/gcore-nonstop.exp > @@ -0,0 +1,62 @@ > +# Copyright 2022 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# This testcase checks that when in non-stop mode with some threads running > +# the gcore command can interrupt all threads, generate a core dump and > +# restart threads as required. > + > +standard_testfile > + > +if { [prepare_for_testing "failed to prepare" \ > + ${testfile} ${srcfile} {threads debug}] } { > + return > +} > + > +gdb_test_no_output "set non-stop on" > +set lineno [gdb_get_line_number "Break here"] > +if { ![runto $lineno] } { > + return > +} > + > +# We should be stopped in thread 1 while thread 2 is running Please add missing end period. > +gdb_test_sequence "info threads" "info threads" { > + {Id\s+Target Id\s+Frame} > + {\*\s+1[^\n]*\n} > + {\s+2\s+[^\n]*\(running\)[^\n]*\n} > +} > + > +set corefile [standard_output_file "corefile"] > +if {![gdb_gcore_cmd $corefile "generate corefile"]} { > + # gdb_gcore_cmd did would generate a unsupported. "did would" does not parse. > + return > +} > + > +# After the corefile is generated, thread 2 should be back running > +# and thread 1 should still be selectd selectd -> selected Also, missing period. > +gdb_test_sequence "info threads" "correct thread selection after gcore" { > + {Id\s+Target Id\s+Frame} > + {\*\s+1[^\n]*\n} > + {\s+2\s+[^\n]*\(running\)[^\n]*\n} Might be good to also check that thread 1 is still stopped, and stopped where it was stopped before. > +} > + > +clean_restart $binfile > +gdb_test "core-file $corefile" "Core was generated by.*" "load corefile" > + > +# The corefile has the 2 threads Missing period. > +gdb_test_sequence "info threads" "threads in corefile" { > + {Id\s+Target Id\s+Frame} > + {\s+1\s+Thread[^\n]*\n} > + {\s+2\s+Thread[^\n]*\n} > +} > > base-commit: a13886e2198beb78b81c59839043b021ce6df78a >