From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lndn.lancelotsix.com (vps-42846194.vps.ovh.net [IPv6:2001:41d0:801:2000::2400]) by sourceware.org (Postfix) with ESMTPS id 1724D3858C1F for ; Wed, 16 Feb 2022 00:28:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1724D3858C1F Received: from Plymouth (unknown [IPv6:2a02:390:9086:0:5a9e:8aa:9872:5fe7]) by lndn.lancelotsix.com (Postfix) with ESMTPSA id A4AB480D5D; Wed, 16 Feb 2022 00:28:26 +0000 (UTC) Date: Wed, 16 Feb 2022 00:28:22 +0000 From: Lancelot SIX To: Simon Marchi Cc: gdb-patches@sourceware.org, Simon Marchi Subject: Re: [PATCH v2 6/9] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) Message-ID: <20220216002822.g3g3um6mwfwtcv4q@Plymouth> References: <20220118040937.730282-1-simon.marchi@polymtl.ca> <20220118040937.730282-7-simon.marchi@polymtl.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220118040937.730282-7-simon.marchi@polymtl.ca> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.11 (lndn.lancelotsix.com [0.0.0.0]); Wed, 16 Feb 2022 00:28:26 +0000 (UTC) X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Feb 2022 00:28:32 -0000 Hi, I have a question (in the test), and few styling related remarks I have added below in the text. On Mon, Jan 17, 2022 at 11:09:34PM -0500, Simon Marchi via Gdb-patches wrote: > From: Simon Marchi > > There is a problem with how GDB handles a vfork happening in a > multi-threaded program. This problem was reported to me by somebody not > using vfork directly, but using system(3) in a multi-threaded program, > which may be implemented using vfork. > > This patch only deals about the follow-fork-mode=parent, > detach-on-fork=on case, because it would be too much to chew at once to > fix the bugs in the other cases as well (I tried). > > The problem > ----------- > > When a program vforks, the parent thread is suspended by the kernel > until the child process exits or execs. Specifically, in a > multi-threaded program, only the thread that called vfork is suspended, > other threads keep running freely. This is documented in the vfork(2) > man page ("Caveats" section). > > Let's suppose GDB is handling a vfork and the user's desire is to detach > from the child. Before detaching the child, GDB must remove the software > breakpoints inserted in the shared parent/child address space, in case > there's a breakpoint in the path the child is going to take before > exec'ing or exit'ing (unlikely, but possible). Otherwise the child could > hit a breakpoint instruction while running outside the control of GDB, > which would make it crash. GDB must also avoid re-inserting breakpoints > in the parent as long as it didn't receive the "vfork done" event (that > is, when the child has exited or execed): since the address space is > shared with the child, that would re-insert breakpoints in the child > process also. So what GDB does is: > > 1. Receive "vfork" event for the parent > 2. Remove breakpoints from the (shared) address space and set > program_space::breakpoints_not_allowed to avoid re-inserting them > 3. Detach from the child thread > 4. Resume the parent > 5. Wait for and receive "vfork done" event for the parent > 6. Clean program_space::breakpoints_not_allowed and re-insert > breakpoints > 7. Resume the parent > > Resuming the parent at step 4 is necessary in order for the kernel to > report the "vfork done" event. The kernel won't report a ptrace event > for a thread that is ptrace-stopped. But the theory behind this is that > between steps 4 and 5, the parent won't actually do any progress even > though it is ptrace-resumed, because the kernel keeps it suspended, > waiting for the child to exec or exit. So it doesn't matter for that > thread if breakpoints are not inserted. > > The problem is when the program is multi-threaded. In step 4, GDB > resumes all threads of the parent. The thread that did the vfork stays > suspended by the kernel, so that's fine. But other threads are running > freely while breakpoints are removed, which is a problem because they > could miss a breakpoint that they should have hit. > > The problem is present with all-stop and non-stop targets. The only > difference is that with an all-stop targets, the other threads are > stopped by the target when it reports the vfork event and are resumed by > the target when GDB resumes the parent. With a non-stop target, the > other threads are simply never stopped. > > The fix > ------- > > There many combinations of settings to consider (all-stop/non-stop, > target-non-stop on/off, follow-fork-mode parent/child, detach-on-fork > on/off, schedule-multiple on/off), but for this patch I restrict the > scope to follow-fork-mode=parent, detach-on-fork=on. That's the > "default" case, where we detach the child and keep debugging the > parent. I tried to fix them all, but it's just too much to do at once. > The code paths and behaviors for when we don't detach the child are > completely different. > > The guiding principle for this patch is that all threads of the vforking > inferior should be stopped as long as breakpoints are removed. This is > similar to handling in-line step-overs, in a way. > > For non-stop targets (the default on Linux native), this is what > happens: > > - In follow_fork, we call stop_all_threads to stop all threads of the > inferior > - In follow_fork_inferior, we record the vfork parent thread in > inferior::thread_waiting_for_vfork_done > - Back in handle_inferior_event, we call keep_going, which resumes only > the event thread (this is already the case, with a non-stop target). > This is the thread that will be waiting for vfork-done. > - When we get the vfork-done event, we go in the (new) handle_vfork_done > function to restart the previously stopped threads. > > In the same scenario, but with an all-stop target: > > - In follow_fork, no need to stop all threads of the inferior, the > target has stopped all threads of all its inferiors before returning > the event. > - In follow_fork_inferior, we record the vfork parent thread in > inferior::thread_waiting_for_vfork_done. > - Back in handle_inferior_event, we also call keep_going. However, we > only want to resume the event thread here, not all inferior threads. > In internal_resume_ptid (called by resume_1), we therefore now check > whether one of the inferiors we are about to resume has > thread_waiting_for_vfork_done set. If so, we only resume that > thread. When resuming multiple inferiors, one vforking and one not > vforking, we could resume the vforking thread from the vforking > inferior plus all threads from the vforking inferior. However, the ^ e Did you forget a "non-" here? > target_resume interface today does not allow this. > - When we get the vfork-done event, the existing call to keep_going > naturally resumes all threads. > > Testing-wise, add a test that tries to make the main thread hit a > breakpoint while a secondary thread calls vfork. Without the fix, the > main thread keeps going while breakpoints are removed, resulting in a > missed breakpoint and the program exiting. > > Change-Id: I20eb78e17ca91f93c19c2b89a7e12c382ee814a1 > --- > gdb/infrun.c | 134 ++++++++++++++++-- > .../gdb.threads/vfork-multi-thread.c | 74 ++++++++++ > .../gdb.threads/vfork-multi-thread.exp | 96 +++++++++++++ > 3 files changed, 296 insertions(+), 8 deletions(-) > create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-thread.c > create mode 100644 gdb/testsuite/gdb.threads/vfork-multi-thread.exp > > diff --git a/gdb/infrun.c b/gdb/infrun.c > index c976e3efbfb4..75252be22038 100644 > --- a/gdb/infrun.c > +++ b/gdb/infrun.c > @@ -95,6 +95,11 @@ static void resume (gdb_signal sig); > > static void wait_for_inferior (inferior *inf); > > +static void restart_threads (struct thread_info *event_thread, > + inferior *inf = nullptr); > + > +static bool start_step_over (void); > + > /* Asynchronous signal handler registered as event loop source for > when we have pending events ready to be passed to the core. */ > static struct async_event_handler *infrun_async_inferior_event_token; > @@ -431,6 +436,8 @@ holding the child stopped. Try \"set detach-on-fork\" or \ > inferior *parent_inf = current_inferior (); > inferior *child_inf = nullptr; > > + gdb_assert (parent_inf->thread_waiting_for_vfork_done == nullptr); > + > if (!follow_child) > { > /* Detach new forked process? */ > @@ -640,7 +647,6 @@ holding the child stopped. Try \"set detach-on-fork\" or \ > child_inf->pending_detach = 0; > parent_inf->vfork_child = child_inf; > parent_inf->pending_detach = detach_fork; > - parent_inf->thread_waiting_for_vfork_done = nullptr; > } > else if (detach_fork) > { > @@ -773,6 +779,12 @@ follow_fork () > parent = inferior_ptid; > child = tp->pending_follow.child_ptid (); > > + /* If handling a vfork, stop all the inferior's threads, they will be > + restarted when the vfork shared region is complete. */ > + if (tp->pending_follow.kind () == TARGET_WAITKIND_VFORKED > + && target_is_non_stop_p ()) > + stop_all_threads ("handling vfork", tp->inf); > + > process_stratum_target *parent_targ = tp->inf->process_target (); > /* Set up inferior(s) as specified by the caller, and tell the > target to do whatever is necessary to follow either parent > @@ -1034,6 +1046,53 @@ handle_vfork_child_exec_or_exit (int exec) > } > } > > +/* Handle TARGET_WAITKIND_VFORK_DONE. */ > + > +static void > +handle_vfork_done (thread_info *event_thread) > +{ > + /* We only care about this event if inferior::thread_waiting_for_vfork_done is > + set, that is if we are waiting for a vfork child not under our control > + (because we detached it) to exec or exit. > + > + If an inferior has vforked and we are debugging the child, we don't use > + the vfork-done event to get notified about the end of the shared address > + space window). We rely instead on the child's exec or exit event, and the ^ This closing parens has no corresponding opening one. > + inferior::vfork_{parent,child} fields are used instead. See > + handle_vfork_child_exec_or_exit for that. */ > + if (event_thread->inf->thread_waiting_for_vfork_done == nullptr) > + { > + infrun_debug_printf ("not waiting for a vfork-done event"); > + return; > + } > + > + INFRUN_SCOPED_DEBUG_ENTER_EXIT; > + > + /* We stopped all threads (other than the vforking thread) of the inferior in > + follow_fork and kept them stopped until now. It should therefore not be > + possible for another thread to have reported a vfork during that window. > + If THREAD_WAITING_FOR_VFORK_DONE is set, it has to be the same thread whose > + vfork-done we are handling right now. */ > + gdb_assert (event_thread->inf->thread_waiting_for_vfork_done == event_thread); > + > + event_thread->inf->thread_waiting_for_vfork_done = nullptr; > + event_thread->inf->pspace->breakpoints_not_allowed = 0; > + > + /* On non-stop targets, we stopped all the inferior's threads in follow_fork, > + resume them now. On all-stop targets, everything that needs to be resumed > + will be when we resume the event thread. */ > + if (target_is_non_stop_p ()) > + { > + /* restart_threads and start_step_over may change the current thread, make > + sure we leave the event thread as the current thread. */ > + scoped_restore_current_thread restore_thread; > + > + insert_breakpoints (); > + restart_threads (event_thread, event_thread->inf); > + start_step_over (); > + } > +} > + > /* Enum strings for "set|show follow-exec-mode". */ > > static const char follow_exec_mode_new[] = "new"; > @@ -1908,6 +1967,16 @@ start_step_over (void) > continue; > } > > + if (tp->inf->thread_waiting_for_vfork_done) Should be tp->inf->thread_waiting_for_vfork_done != nullptr > + { > + /* When we stop all threads, handling a vfork, any thread in the step > + over chain remains there. A user could also try to continue a > + thread stopped at a breakpoint while another thread is waiting for > + a vfork-done event. In any case, we don't want to start a step > + over right now. */ > + continue; > + } > + > /* Remove thread from the THREADS_TO_STEP chain. If anything goes wrong > while we try to prepare the displaced step, we don't add it back to > the global step over chain. This is to avoid a thread staying in the > @@ -2143,8 +2212,41 @@ internal_resume_ptid (int user_step) > return a wildcard ptid. */ > if (target_is_non_stop_p ()) > return inferior_ptid; > - else > - return user_visible_resume_ptid (user_step); > + > + /* The rest of the function assumes non-stop==off and > + target-non-stop==off. > + > + If a thread is waiting for a vfork-done event, it means breakpoints are out > + for this inferior (well, program space in fact). We don't want to resume > + any thread other than the one waiting for vfork done, otherwise these other > + threads could miss breakpoints. So if a thread in the resumption set is > + waiting for a vfork-done event, resume only that thread. > + > + The resumption set width depends on whether schedule-multiple is on or off. > + > + Note that if the target_resume interface was more flexible, we could be > + smarter here when schedule-multiple is on. For example, imagine 3 > + inferiors with 2 threads each (1.1, 1.2, 2.1, 2.2, 3.1 and 3.2). Threads > + 2.1 and 3.2 are both waiting for a vfork-done event. Then we could ask the > + target(s) to resume: > + > + - All threads of inferior 1 > + - Thread 2.1 > + - Thread 3.2 > + > + Since we don't have that flexibility (we can only pass one ptid), just > + resume the first thread waiting for a vfork-done event we find (e.g. thread > + 2.1). */ > + if (sched_multi) > + { > + for (inferior *inf : all_non_exited_inferiors ()) > + if (inf->thread_waiting_for_vfork_done != nullptr) > + return inf->thread_waiting_for_vfork_done->ptid; > + } > + else if (current_inferior ()->thread_waiting_for_vfork_done != nullptr) > + return current_inferior ()->thread_waiting_for_vfork_done->ptid; > + > + return user_visible_resume_ptid (user_step); > } > > /* Wrapper for target_resume, that handles infrun-specific > @@ -3258,6 +3360,19 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal) > continue; > } > > + /* If a thread of that inferior is waiting for a vfork-done > + (for a detached vfork child to exec or exit), breakpoints are > + removed. We must not resume any thread of that inferior, other > + than the one waiting for the vfork-done. */ > + if (tp->inf->thread_waiting_for_vfork_done != nullptr > + && tp != tp->inf->thread_waiting_for_vfork_done) > + { > + infrun_debug_printf ("[%s] another thread of this inferior is " > + "waiting for vfork-done", > + tp->ptid.to_string ().c_str ()); > + continue; > + } > + > infrun_debug_printf ("resuming %s", > tp->ptid.to_string ().c_str ()); > > @@ -3268,7 +3383,12 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal) > error (_("Command aborted.")); > } > } > - else if (!cur_thr->resumed () && !thread_is_in_step_over_chain (cur_thr)) > + else if (!cur_thr->resumed () > + && !thread_is_in_step_over_chain (cur_thr) > + /* In non-stop, forbid resume a thread if some other thread of > + that inferior is waiting for a vfork-done event (this means > + breakpoints are out for this inferior). */ > + && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done)) Should be cur_thr->inf->thread_waiting_for_vfork_done != nullptr > { > /* The thread wasn't started, and isn't queued, run it now. */ > reset_ecs (ecs, cur_thr); > @@ -3758,8 +3878,6 @@ struct wait_one_event > }; > > static bool handle_one (const wait_one_event &event); > -static void restart_threads (struct thread_info *event_thread, > - inferior *inf = nullptr); > > /* Prepare and stabilize the inferior for detaching it. E.g., > detaching while a thread is displaced stepping is a recipe for > @@ -5640,8 +5758,8 @@ handle_inferior_event (struct execution_control_state *ecs) > > context_switch (ecs); > > - current_inferior ()->thread_waiting_for_vfork_done = nullptr; > - current_inferior ()->pspace->breakpoints_not_allowed = 0; > + handle_vfork_done (ecs->event_thread); > + gdb_assert (inferior_thread () == ecs->event_thread); > > if (handle_stop_requested (ecs)) > return; > diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.c b/gdb/testsuite/gdb.threads/vfork-multi-thread.c > new file mode 100644 > index 000000000000..cd3dfcc1e653 > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.c > @@ -0,0 +1,74 @@ > +#include > +#include > +#include > +#include > +#include > + > +// Start with: > +// > +// ./gdb -nx -q --data-directory=data-directory repro -ex "tb break_here_first" -ex r -ex "b should_break_here" > +// > +// Then do "continue". > +// > +// The main thread will likely cross should_break_here while we are handling > +// the vfork and breakpoints are removed, therefore missing the breakpoint. > + > +static volatile int release_vfork = 0; > +static volatile int release_main = 0; > + > +static void *vforker(void *arg) shouldn’t vforker shoult be at column 0, even in test code? > +{ > + while (!release_vfork); > + > + pid_t pid = vfork (); > + if (pid == 0) { > + /* A vfork child is not supposed to mess with the state of the program, > + but it is helpful for the purpose of this test. */ > + release_main = 1; > + _exit(7); > + } > + > + int stat; > + int ret = waitpid (pid, &stat, 0); > + assert (ret == pid); > + assert (WIFEXITED (stat)); > + assert (WEXITSTATUS (stat) == 7); > + > + return NULL; > +} > + > +static void should_break_here(void) {} > + > +int main() Same, main at column 0. > +{ > + > + pthread_t thread; > + int ret = pthread_create(&thread, NULL, vforker, NULL); > + assert(ret == 0); space before paren. > + > + /* We break here first, while the thread is stuck on `!release_fork`. */ > + release_vfork = 1; > + > + /* We set a breakpoint on should_break_here. > + > + We then set "release_fork" from the debugger and continue. The main > + thread hangs on `!release_main` while the non-main thread vforks. During > + the window of time where the two processes have a shared address space > + (after vfork, before _exit), GDB removes the breakpoints from the address > + space. During that window, only the vfork-ing thread (the non-main > + thread) is frozen by the kernel. The main thread is free to execute. The > + child process sets `release_main`, releasing the main thread. A buggy GDB > + would let the main thread execute during that window, leading to the > + breakpoint on should_break_here being missed. A fixed GDB does not resume > + the threads of the vforking process other than the vforking thread. When > + the vfork child exits, the fixed GDB resumes the main thread, after > + breakpoints are reinserted, so the breakpoint is not missed. */ > + > + while (!release_main); > + > + should_break_here(); Space before paren. > + > + pthread_join (thread, NULL); > + > + return 6; > +} > diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.exp b/gdb/testsuite/gdb.threads/vfork-multi-thread.exp > new file mode 100644 > index 000000000000..d405411be012 > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.exp > @@ -0,0 +1,96 @@ > +# Copyright 2022 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# Test that a multi-threaded program doing a vfork doesn't miss breakpoints. > +# > +# When a program vforks, its address space is shared with the parent. When we > +# detach a vfork child, we must keep breakpoints out of that shared address space > +# until the child either exits or execs, so that the child does not hit a > +# breakpoint while out of GDB's control. During that time, threads from > +# the parent must be held stopped, otherwise they could miss breakpoints. > +# > +# The thread that did the vfork is suspended by the kernel, so it's not a > +# concern. The other threads need to be manually stopped by GDB and resumed > +# once the vfork critical region is done. > +# > +# This test spawns one thread that calls vfork. Meanwhile, the main thread > +# crosses a breakpoint. A buggy GDB would let the main thread run while > +# breakpoints are removed, so the main thread would miss the breakpoint and run > +# until exit. b> + > +standard_testfile > + > +if { [build_executable "failed to prepare" ${testfile} ${srcfile} {debug pthreads}] } { > + return > +} > + > +set any "\[^\r\n\]*" > + > +# A bunch of util procedures to continue an inferior to an expected point. > + > +proc continue_to_parent_breakpoint {} { > + gdb_test "continue" \ > + "hit Breakpoint .* should_break_here .*" \ > + "continue parent to breakpoint" > +} > + > +proc continue_to_parent_end {} { > + gdb_test "continue" "Inferior 1.*exited with code 06.*" \ > + "continue parent to end" > +} > + > +# Run the test with the given GDB settings. > + > +proc do_test { target-non-stop non-stop follow-fork-mode detach-on-fork schedule-multiple } { > + save_vars { ::GDBFLAGS } { > + append ::GDBFLAGS " -ex \"maintenance set target-non-stop ${target-non-stop}\"" > + append ::GDBFLAGS " -ex \"set non-stop ${non-stop}\"" > + clean_restart ${::binfile} > + } > + > + gdb_test_no_output "set follow-fork-mode ${follow-fork-mode}" > + gdb_test_no_output "set detach-on-fork ${detach-on-fork}" > + gdb_test_no_output "set schedule-multiple ${schedule-multiple}" OOC, why do you use GDBFLAGS to set 2 settings and gdb_test_no_output for 3 settings? > + > + # The message about thread 2 of inferior 1 exiting happens at a somewhat > + # unpredictable moment, it's simpler to silence it than to try to match it. > + gdb_test_no_output "set print thread-events off" > + > + if { ![runto_main] } { > + return > + } > + > + # The main thread is expected to hit this breakpoint. > + gdb_test "break should_break_here" "Breakpoint $::decimal at .*" > + > + continue_to_parent_breakpoint > + continue_to_parent_end > +} > + > +# We only test with follow-fork-mode=parent and detach-on-fork=on at the > +# moment, but the loops below are written to make it easy to add other values > +# on these axes in the future. > + > +foreach_with_prefix target-non-stop {auto on off} { > + foreach_with_prefix non-stop {off on} { > + foreach_with_prefix follow-fork-mode {parent} { > + foreach_with_prefix detach-on-fork {on} { > + foreach_with_prefix schedule-multiple {off on} { > + do_test ${target-non-stop} ${non-stop} ${follow-fork-mode} ${detach-on-fork} ${schedule-multiple} > + } > + } > + } > + } > +} > -- > 2.34.1 > Best, Lancelot.