From: John Baldwin <jhb@FreeBSD.org>
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
Subject: Re: [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes.
Date: Mon, 27 Mar 2023 14:13:46 -0700 [thread overview]
Message-ID: <a9ff982d-dc96-3d7b-d128-014ed7b8e8f3@FreeBSD.org> (raw)
In-Reply-To: <8fb811a5-f363-d3aa-5b63-4fcc434b3e17@simark.ca>
On 3/20/23 12:55 PM, Simon Marchi wrote:
> On 2/28/23 13:18, John Baldwin wrote:
>> I did not fully understand the requirements of multiple process
>> support when I enabled it previously and several parts were broken.
>> In particular, the resume method was only resuming a single process,
>> and wait was not stopping other processes when reporting an event.
>>
>> To support multiple running inferiors, add a new per-inferior
>> structure which trackes the number of existing and running LWPs for
>> each process. The structure also stores a ptid_t describing the
>> set of LWPs currently resumed for each process.
>
> Ah, that sounds good, related to my comments on previous patches about
> tracking the resume state for each LWP. I don't know if it needs to be
> per-inferior though, versus a global table like Linux does (but perhaps
> it helps, I'll see when reading the code).
>
>>
>> For the resume method, iterate over all non-exited inferiors resuming
>> each process matching the passed in ptid rather than only resuming the
>> current inferior's process for a wildcard ptid. If a resumed process
>> has a pending event, don't actually resume the process, but other
>> matching processes without a pending event are still resumed in case
>> the later call to the wait method requests an event from one of the
>> processes without a pending event.
>>
>> For the wait method, stop other running processes before returning an
>> event to the core. When stopping a process, first check to see if an
>> event is already pending. If it is, queue the event to be reported
>> later. If not, send a SIGSTOP to the process and wait for it to stop.
>> If the event reported by the wait is not for the SIGSTOP, queue the
>> event and remember to ignore a future SIGSTOP event for the process.
>>
>> Note that, unlike the Linux native target, entire processes are
>> stopped rather than individual LWPs. In FreeBSD one can only wait on
>> processes (via pid), not for an event from a specific thread.
>>
>> Other changes in this commit handle bookkeeping for the per-inferior
>> data such as purging the data in the mourn_inferior method and
>> migrating the data to the new inferior in the follow_exec method. The
>> per-inferior data is created in the attach, create_inferior, and
>> follow_fork methods.
>> ---
>> gdb/fbsd-nat.c | 403 +++++++++++++++++++++++++++++++++++++------------
>> gdb/fbsd-nat.h | 8 +
>> 2 files changed, 317 insertions(+), 94 deletions(-)
>>
>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>> index 3f7278c6ea0..14b31ddd86e 100644
>> --- a/gdb/fbsd-nat.c
>> +++ b/gdb/fbsd-nat.c
>> @@ -54,11 +54,26 @@
>> #define PT_SETREGSET 43 /* Set a target register set */
>> #endif
>>
>> -/* Filter for ptid's allowed to report events from wait. Normally set
>> - in resume, but also reset to minus_one_ptid in create_inferior and
>> - attach. */
>> +/* Information stored about each inferior. */
>>
>> -static ptid_t resume_ptid;
>> +struct fbsd_inferior_info
>> +{
>> + /* Filter for resumed LWPs which can report events from wait. */
>> + ptid_t resumed_lwps = null_ptid;
>> +
>> + /* Number of LWPs this process contains. */
>> + unsigned int num_lwps = 0;
>> +
>> + /* Number of LWPs currently running. */
>> + unsigned int running_lwps = 0;
>> +
>> + /* Have a pending SIGSTOP event that needs to be discarded. */
>> + bool pending_sigstop = false;
>> +};
>
> Ok, it's not exactly what I expected, but I will keep on reading.
>
> Long term, I don't think the resumed_lwps field will be enough to
> describe the resume state of individual threads. Actually, to
> complement the example I gave on an earlier patch, I guess you could do
> that today?
>
> (gdb) set scheduler-locking on
> (gdb) thread 1
> (gdb) continue & # only supposed to resume thread 1
> (gdb) thread 2
> (gdb) continue & # only supposed to resume thread 2
>
> Here, resume_lwps would end up as (pid, thread_2_lwp, 0), right?
Ok, that answers a question I had then. I do have a follow-up to this
I haven't posted (I mentioned it in the cover letter) where I replace
the single ptid with an unordered_set<> of LWPs belonging to the
process that should be resumed. However, in order to make this work
I had to make all "real" resume calls deferred using commit_resumed
(but that only kind of works, I have to explicitly do commit_resumed
at the start of wait() because the core doesn't do it for me). Basically,
in my followup the fbsd_nat::resume method just modifies state in the
per-inferior struct to keep track of at most one pending signal/step
and a set of LWPs to resume. Then when either commit_resumed or wait
is called I walk all inferiors for the native target doing the actual
resume (PT_CONTINUE) after using PT_SUSPEND/PT_RESUME on the set of
LWPs that need to actually run for the given process.
Currently though I get new regressions with that approach compared to
this series. :(
FWIW, with this series what would happen for your example above is
that an assert() trips in fbsd_nat::resume when it tries to resume
the second thread when the process is already resumed.
>> +
>> +/* Per-inferior data key. */
>> +
>> +static const registry<inferior>::key<fbsd_inferior_info> fbsd_inferior_data;
>>
>> /* If an event is triggered asynchronously (fake vfork_done events) or
>> occurs when the core is not expecting it, a pending event is
>> @@ -95,21 +110,27 @@ have_pending_event (ptid_t filter)
>> return false;
>> }
>>
>> -/* Helper method called by the target wait method. Returns true if
>> - there is a pending event matching resume_ptid. If there is a
>> - matching event, PTID and *STATUS contain the event details, and the
>> - event is removed from the pending list. */
>> +/* Returns true if there is a pending event for a resumed process
>> + matching FILTER. If there is a matching event, PTID and *STATUS
>> + contain the event details, and the event is removed from the
>> + pending list. */
>>
>> static bool
>> -take_pending_event (ptid_t &ptid, target_waitstatus *status)
>> +take_pending_event (fbsd_nat_target *target, ptid_t filter, ptid_t &ptid,
>> + target_waitstatus *status)
>> {
>> for (auto it = pending_events.begin (); it != pending_events.end (); it++)
>> - if (it->ptid.matches (resume_ptid))
>> + if (it->ptid.matches (filter))
>> {
>> - ptid = it->ptid;
>> - *status = it->status;
>> - pending_events.erase (it);
>> - return true;
>> + inferior *inf = find_inferior_ptid (target, it->ptid);
>> + fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
>> + if (it->ptid.matches (info->resumed_lwps))
>> + {
>> + ptid = it->ptid;
>> + *status = it->status;
>> + pending_events.erase (it);
>> + return true;
>> + }
>
> If that code was kept as-is, I think take_pending_event should be a
> method of fbsd_nat_target, rather than passing it manually.
Ok.
>> +
>> + if (ptid.pid () != inferior_ptid.pid ())
>> + {
>> + step = 0;
>> + signo = GDB_SIGNAL_0;
>> + gdb_assert (!ptid.lwp_p ());
>
> I don't get why you ignore the step request here. Perhaps it is due to
> a misundertanding of the resume interface (which was really confusing
> before Pedro's clarification)?
>
> The comment on target_resume and the commit message on Pedro's change
> explain it well, but essentially:
>
> - The ptid passed as a parameter (SCOPE_PTID) is the set of threads to
> resume. If you keep a list of LWPs, then you can just go over that
> list and resume everything that ptid_t::matches SCOPE_PTID, and that
> doesn't have a pending event.
> - STEP indicates whether to resume INFERIOR_PTID in resume mode. If
> STEP is 0, it means that no thread is resumed in step mode they are
> all resumed normally.
> - SIGNAL indicates what signal to resume INFERIOR_PTID with. If
> it's GDB_SIGNAL_0, it means resume without a signal.
>
> So, the last two bullets only modify how the thread identified by
> INFERIOR_PTID is resumed. I think that in practice, it's guaranteed in
> practice today that INFERIOR_PTID is "within" SCOPE_PTID. But you can
> also write the code without that assumption, it should be much harder.
>
> For threads that are not INFERIOR_PTID, I think the target should
> resume them with the signal in thread_info::m_suspend::stop_signal. But
> that can be a problem for another day.
Mmmm, that would be good to know, that detail is not obvious.
> I'll wait for clarifications from you before continuing to read, because
> I am a bit lost with the approach taken here.
To clarify the code above, it might be helpful to note that there is now
a "resume_one_process" function that you are replying to above that is
called in a loop by the actual resume method:
void
fbsd_nat_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
{
fbsd_nat_debug_start_end ("[%s], step %d, signo %d (%s)",
target_pid_to_str (scope_ptid).c_str (), step, signo,
gdb_signal_to_name (signo));
gdb_assert (inferior_ptid.matches (scope_ptid));
gdb_assert (!scope_ptid.tid_p ());
if (scope_ptid == minus_one_ptid)
{
for (inferior *inf : all_non_exited_inferiors (this))
resume_one_process (ptid_t (inf->pid), step, signo);
}
else
{
resume_one_process (scope_ptid, step, signo);
}
}
The reason for the quoted code that clears the step/signal field in
resume_one_process is that I wasn't sure if I could get a "global" resume
(scope_ptid == minus_one_ptid) where step and/or signo was also set. In
that case I wanted to be sure that I only applied to requested step/signo
to inferior_ptid and not the other processes. That is, it's trying to
handle a case of:
- inferior_ptid == ptid(P1, T1)
- resume(minus_one_ptid, step=1)
In that case it does the loop over all inferiors, but I only want
to pass step=1 down to inf_ptrace::resume() for the inferior (process)
whose pid is P1.
If you are telling me I can write an assertion in ::resume() that is
more like:
if (step || signo != GDB_SIGNAL_0)
gdb_assert(scope_ptid != minus_one_ptid);
Then that would mean I could avoid clearing them in resume_one_process.
--
John Baldwin
next prev parent reply other threads:[~2023-03-27 21:13 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
2023-03-20 17:43 ` Simon Marchi
2023-03-20 18:08 ` John Baldwin
2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
2023-03-20 17:45 ` Simon Marchi
2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
2023-03-20 17:47 ` Simon Marchi
2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
2023-03-20 18:35 ` Simon Marchi
2023-03-27 20:24 ` John Baldwin
2023-03-27 20:57 ` Simon Marchi
2023-03-27 21:00 ` John Baldwin
2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
2023-03-20 19:09 ` Simon Marchi
2023-03-27 20:49 ` John Baldwin
2023-03-27 21:04 ` Simon Marchi
2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
2023-03-20 19:55 ` Simon Marchi
2023-03-27 21:13 ` John Baldwin [this message]
2023-02-28 18:18 ` [PATCH 7/9] fbsd-nat: Fix several issues with detaching John Baldwin
2023-02-28 18:18 ` [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread John Baldwin
2023-02-28 18:18 ` [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it John Baldwin
2023-03-14 18:21 ` [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a9ff982d-dc96-3d7b-d128-014ed7b8e8f3@FreeBSD.org \
--to=jhb@freebsd.org \
--cc=gdb-patches@sourceware.org \
--cc=simark@simark.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).