From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.freebsd.org (mx2.freebsd.org [96.47.72.81]) by sourceware.org (Postfix) with ESMTPS id 6D2003858417 for ; Mon, 27 Mar 2023 21:13:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D2003858417 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=FreeBSD.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [96.47.72.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits)) (Client CN "mx1.freebsd.org", Issuer "R3" (verified OK)) by mx2.freebsd.org (Postfix) with ESMTPS id 4Pllt42BBnz4LjM; Mon, 27 Mar 2023 21:13:48 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Pllt415RFz3k1L; Mon, 27 Mar 2023 21:13:48 +0000 (UTC) (envelope-from jhb@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1679951628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NF7lNpan/OBAxJCQvqDgVsR2jfgVgm5gzovJSiKKi5g=; b=w82a08K+S2IYt9QdjKe3dhHsvR5UaEqjMPk5UusJED5peoefqR+JHT5VU6haFqIDOpTqmL PGHSxh/c1Nyf+5clCL+w9cG4CNQMb9ft5tVC/2hotaiEq1eix9oYofkyJbXiF/RqMlkmMz 9bAILFIddpzOieIjQr76huezx97GKhlKx1Xlsql2Mls/SciVB94NaFvQdKiPI20i3MC4vw NcTbXuhMr2kpjpU5aWxFMUXYWqdMC+iAH03XfSyecBD+Uwt1zwGy3Dbp2xThUEh6Iw+6XB B3d6tD9G8K6r0IGvFDpK0vc8BENjPtllhpGmwpohctWqkupweJg7WbskeFIjhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1679951628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NF7lNpan/OBAxJCQvqDgVsR2jfgVgm5gzovJSiKKi5g=; b=HvrcyaK39i/P8Us/fsWhUjuib9y8RWQLacn6WStu8iVXZhJ2b/pTKb1UvUJE4duA5IsTje NvqIH0j370yfi9UEoyh+Lwg9lJKfRELzVAnYSfpktQ1qLV7SYGHMD0dyGES0K2FyMFbtt/ XXdRKrPoH+Cadk/nwiNlUFbiRuwOrd1BNFEoggk/caLR7aOKN6opPkxgyO8ptE1/yolJNi BBC3Dp7IzBwOtvCBP/ngBG/MTQ7ExchIQlMvrSRgSqntyuvIJbdu0z/PZgzyH6pmA3KJ1d XW3W9WdWwERBD110m87zLwsuLnJmrpiSW1L9oQ3UhrBVcTM3CnTqwUhngitLUg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1679951628; a=rsa-sha256; cv=none; b=JBv5iE7ISG15pXT1u3/BHBs1IzJ8B6xvJgbKTgg3DLmjWlCZl8EFHj5rMp2KE8FkK3tTbL dJMaq4bBNxga12pQpSQumJRa0gBQah281W4kiDOPOv4n1K62m1qtF/si+70QyoAQ8MTiDM X6HlMvgdlFPAmZ71c+rDJ7gWF49JLNeylPGgrRQ36w3V6SodLyb2H2b9RdrKveCJFQG52r Znv1Sg9kmX5JC3yXIrZ7cKHyeWQ67hdYY6kcAqCaodeoekp52mmZJHRsk++OIwlF0rrann cUobVExKiKNNp7QJSdv89rVuoHipkTfBT/d9E5v9I/47K7UOEa753Yhe9poarg== Received: from [IPV6:2601:648:8680:16b0:c10c:3358:4516:c03f] (unknown [IPv6:2601:648:8680:16b0:c10c:3358:4516:c03f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id 4Pllt35Gp7zLLY; Mon, 27 Mar 2023 21:13:47 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: Date: Mon, 27 Mar 2023 14:13:46 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Content-Language: en-US To: Simon Marchi , gdb-patches@sourceware.org References: <20230228181845.99936-1-jhb@FreeBSD.org> <20230228181845.99936-7-jhb@FreeBSD.org> <8fb811a5-f363-d3aa-5b63-4fcc434b3e17@simark.ca> From: John Baldwin Subject: Re: [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes. In-Reply-To: <8fb811a5-f363-d3aa-5b63-4fcc434b3e17@simark.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 3/20/23 12:55 PM, Simon Marchi wrote: > On 2/28/23 13:18, John Baldwin wrote: >> I did not fully understand the requirements of multiple process >> support when I enabled it previously and several parts were broken. >> In particular, the resume method was only resuming a single process, >> and wait was not stopping other processes when reporting an event. >> >> To support multiple running inferiors, add a new per-inferior >> structure which trackes the number of existing and running LWPs for >> each process. The structure also stores a ptid_t describing the >> set of LWPs currently resumed for each process. > > Ah, that sounds good, related to my comments on previous patches about > tracking the resume state for each LWP. I don't know if it needs to be > per-inferior though, versus a global table like Linux does (but perhaps > it helps, I'll see when reading the code). > >> >> For the resume method, iterate over all non-exited inferiors resuming >> each process matching the passed in ptid rather than only resuming the >> current inferior's process for a wildcard ptid. If a resumed process >> has a pending event, don't actually resume the process, but other >> matching processes without a pending event are still resumed in case >> the later call to the wait method requests an event from one of the >> processes without a pending event. >> >> For the wait method, stop other running processes before returning an >> event to the core. When stopping a process, first check to see if an >> event is already pending. If it is, queue the event to be reported >> later. If not, send a SIGSTOP to the process and wait for it to stop. >> If the event reported by the wait is not for the SIGSTOP, queue the >> event and remember to ignore a future SIGSTOP event for the process. >> >> Note that, unlike the Linux native target, entire processes are >> stopped rather than individual LWPs. In FreeBSD one can only wait on >> processes (via pid), not for an event from a specific thread. >> >> Other changes in this commit handle bookkeeping for the per-inferior >> data such as purging the data in the mourn_inferior method and >> migrating the data to the new inferior in the follow_exec method. The >> per-inferior data is created in the attach, create_inferior, and >> follow_fork methods. >> --- >> gdb/fbsd-nat.c | 403 +++++++++++++++++++++++++++++++++++++------------ >> gdb/fbsd-nat.h | 8 + >> 2 files changed, 317 insertions(+), 94 deletions(-) >> >> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c >> index 3f7278c6ea0..14b31ddd86e 100644 >> --- a/gdb/fbsd-nat.c >> +++ b/gdb/fbsd-nat.c >> @@ -54,11 +54,26 @@ >> #define PT_SETREGSET 43 /* Set a target register set */ >> #endif >> >> -/* Filter for ptid's allowed to report events from wait. Normally set >> - in resume, but also reset to minus_one_ptid in create_inferior and >> - attach. */ >> +/* Information stored about each inferior. */ >> >> -static ptid_t resume_ptid; >> +struct fbsd_inferior_info >> +{ >> + /* Filter for resumed LWPs which can report events from wait. */ >> + ptid_t resumed_lwps = null_ptid; >> + >> + /* Number of LWPs this process contains. */ >> + unsigned int num_lwps = 0; >> + >> + /* Number of LWPs currently running. */ >> + unsigned int running_lwps = 0; >> + >> + /* Have a pending SIGSTOP event that needs to be discarded. */ >> + bool pending_sigstop = false; >> +}; > > Ok, it's not exactly what I expected, but I will keep on reading. > > Long term, I don't think the resumed_lwps field will be enough to > describe the resume state of individual threads. Actually, to > complement the example I gave on an earlier patch, I guess you could do > that today? > > (gdb) set scheduler-locking on > (gdb) thread 1 > (gdb) continue & # only supposed to resume thread 1 > (gdb) thread 2 > (gdb) continue & # only supposed to resume thread 2 > > Here, resume_lwps would end up as (pid, thread_2_lwp, 0), right? Ok, that answers a question I had then. I do have a follow-up to this I haven't posted (I mentioned it in the cover letter) where I replace the single ptid with an unordered_set<> of LWPs belonging to the process that should be resumed. However, in order to make this work I had to make all "real" resume calls deferred using commit_resumed (but that only kind of works, I have to explicitly do commit_resumed at the start of wait() because the core doesn't do it for me). Basically, in my followup the fbsd_nat::resume method just modifies state in the per-inferior struct to keep track of at most one pending signal/step and a set of LWPs to resume. Then when either commit_resumed or wait is called I walk all inferiors for the native target doing the actual resume (PT_CONTINUE) after using PT_SUSPEND/PT_RESUME on the set of LWPs that need to actually run for the given process. Currently though I get new regressions with that approach compared to this series. :( FWIW, with this series what would happen for your example above is that an assert() trips in fbsd_nat::resume when it tries to resume the second thread when the process is already resumed. >> + >> +/* Per-inferior data key. */ >> + >> +static const registry::key fbsd_inferior_data; >> >> /* If an event is triggered asynchronously (fake vfork_done events) or >> occurs when the core is not expecting it, a pending event is >> @@ -95,21 +110,27 @@ have_pending_event (ptid_t filter) >> return false; >> } >> >> -/* Helper method called by the target wait method. Returns true if >> - there is a pending event matching resume_ptid. If there is a >> - matching event, PTID and *STATUS contain the event details, and the >> - event is removed from the pending list. */ >> +/* Returns true if there is a pending event for a resumed process >> + matching FILTER. If there is a matching event, PTID and *STATUS >> + contain the event details, and the event is removed from the >> + pending list. */ >> >> static bool >> -take_pending_event (ptid_t &ptid, target_waitstatus *status) >> +take_pending_event (fbsd_nat_target *target, ptid_t filter, ptid_t &ptid, >> + target_waitstatus *status) >> { >> for (auto it = pending_events.begin (); it != pending_events.end (); it++) >> - if (it->ptid.matches (resume_ptid)) >> + if (it->ptid.matches (filter)) >> { >> - ptid = it->ptid; >> - *status = it->status; >> - pending_events.erase (it); >> - return true; >> + inferior *inf = find_inferior_ptid (target, it->ptid); >> + fbsd_inferior_info *info = fbsd_inferior_data.get (inf); >> + if (it->ptid.matches (info->resumed_lwps)) >> + { >> + ptid = it->ptid; >> + *status = it->status; >> + pending_events.erase (it); >> + return true; >> + } > > If that code was kept as-is, I think take_pending_event should be a > method of fbsd_nat_target, rather than passing it manually. Ok. >> + >> + if (ptid.pid () != inferior_ptid.pid ()) >> + { >> + step = 0; >> + signo = GDB_SIGNAL_0; >> + gdb_assert (!ptid.lwp_p ()); > > I don't get why you ignore the step request here. Perhaps it is due to > a misundertanding of the resume interface (which was really confusing > before Pedro's clarification)? > > The comment on target_resume and the commit message on Pedro's change > explain it well, but essentially: > > - The ptid passed as a parameter (SCOPE_PTID) is the set of threads to > resume. If you keep a list of LWPs, then you can just go over that > list and resume everything that ptid_t::matches SCOPE_PTID, and that > doesn't have a pending event. > - STEP indicates whether to resume INFERIOR_PTID in resume mode. If > STEP is 0, it means that no thread is resumed in step mode they are > all resumed normally. > - SIGNAL indicates what signal to resume INFERIOR_PTID with. If > it's GDB_SIGNAL_0, it means resume without a signal. > > So, the last two bullets only modify how the thread identified by > INFERIOR_PTID is resumed. I think that in practice, it's guaranteed in > practice today that INFERIOR_PTID is "within" SCOPE_PTID. But you can > also write the code without that assumption, it should be much harder. > > For threads that are not INFERIOR_PTID, I think the target should > resume them with the signal in thread_info::m_suspend::stop_signal. But > that can be a problem for another day. Mmmm, that would be good to know, that detail is not obvious. > I'll wait for clarifications from you before continuing to read, because > I am a bit lost with the approach taken here. To clarify the code above, it might be helpful to note that there is now a "resume_one_process" function that you are replying to above that is called in a loop by the actual resume method: void fbsd_nat_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo) { fbsd_nat_debug_start_end ("[%s], step %d, signo %d (%s)", target_pid_to_str (scope_ptid).c_str (), step, signo, gdb_signal_to_name (signo)); gdb_assert (inferior_ptid.matches (scope_ptid)); gdb_assert (!scope_ptid.tid_p ()); if (scope_ptid == minus_one_ptid) { for (inferior *inf : all_non_exited_inferiors (this)) resume_one_process (ptid_t (inf->pid), step, signo); } else { resume_one_process (scope_ptid, step, signo); } } The reason for the quoted code that clears the step/signal field in resume_one_process is that I wasn't sure if I could get a "global" resume (scope_ptid == minus_one_ptid) where step and/or signo was also set. In that case I wanted to be sure that I only applied to requested step/signo to inferior_ptid and not the other processes. That is, it's trying to handle a case of: - inferior_ptid == ptid(P1, T1) - resume(minus_one_ptid, step=1) In that case it does the loop over all inferiors, but I only want to pass step=1 down to inf_ptrace::resume() for the inferior (process) whose pid is P1. If you are telling me I can write an assertion in ::resume() that is more like: if (step || signo != GDB_SIGNAL_0) gdb_assert(scope_ptid != minus_one_ptid); Then that would mean I could avoid clearing them in resume_one_process. -- John Baldwin