public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: John Baldwin <jhb@FreeBSD.org>
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
Subject: Re: [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes.
Date: Mon, 27 Mar 2023 14:13:46 -0700	[thread overview]
Message-ID: <a9ff982d-dc96-3d7b-d128-014ed7b8e8f3@FreeBSD.org> (raw)
In-Reply-To: <8fb811a5-f363-d3aa-5b63-4fcc434b3e17@simark.ca>

On 3/20/23 12:55 PM, Simon Marchi wrote:
> On 2/28/23 13:18, John Baldwin wrote:
>> I did not fully understand the requirements of multiple process
>> support when I enabled it previously and several parts were broken.
>> In particular, the resume method was only resuming a single process,
>> and wait was not stopping other processes when reporting an event.
>>
>> To support multiple running inferiors, add a new per-inferior
>> structure which trackes the number of existing and running LWPs for
>> each process.  The structure also stores a ptid_t describing the
>> set of LWPs currently resumed for each process.
> 
> Ah, that sounds good, related to my comments on previous patches about
> tracking the resume state for each LWP.  I don't know if it needs to be
> per-inferior though, versus a global table like Linux does (but perhaps
> it helps, I'll see when reading the code).
> 
>>
>> For the resume method, iterate over all non-exited inferiors resuming
>> each process matching the passed in ptid rather than only resuming the
>> current inferior's process for a wildcard ptid.  If a resumed process
>> has a pending event, don't actually resume the process, but other
>> matching processes without a pending event are still resumed in case
>> the later call to the wait method requests an event from one of the
>> processes without a pending event.
>>
>> For the wait method, stop other running processes before returning an
>> event to the core.  When stopping a process, first check to see if an
>> event is already pending.  If it is, queue the event to be reported
>> later.  If not, send a SIGSTOP to the process and wait for it to stop.
>> If the event reported by the wait is not for the SIGSTOP, queue the
>> event and remember to ignore a future SIGSTOP event for the process.
>>
>> Note that, unlike the Linux native target, entire processes are
>> stopped rather than individual LWPs.  In FreeBSD one can only wait on
>> processes (via pid), not for an event from a specific thread.
>>
>> Other changes in this commit handle bookkeeping for the per-inferior
>> data such as purging the data in the mourn_inferior method and
>> migrating the data to the new inferior in the follow_exec method.  The
>> per-inferior data is created in the attach, create_inferior, and
>> follow_fork methods.
>> ---
>>   gdb/fbsd-nat.c | 403 +++++++++++++++++++++++++++++++++++++------------
>>   gdb/fbsd-nat.h |   8 +
>>   2 files changed, 317 insertions(+), 94 deletions(-)
>>
>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>> index 3f7278c6ea0..14b31ddd86e 100644
>> --- a/gdb/fbsd-nat.c
>> +++ b/gdb/fbsd-nat.c
>> @@ -54,11 +54,26 @@
>>   #define	PT_SETREGSET	43	/* Set a target register set */
>>   #endif
>>   
>> -/* Filter for ptid's allowed to report events from wait.  Normally set
>> -   in resume, but also reset to minus_one_ptid in create_inferior and
>> -   attach.  */
>> +/* Information stored about each inferior.  */
>>   
>> -static ptid_t resume_ptid;
>> +struct fbsd_inferior_info
>> +{
>> +  /* Filter for resumed LWPs which can report events from wait.  */
>> +  ptid_t resumed_lwps = null_ptid;
>> +
>> +  /* Number of LWPs this process contains.  */
>> +  unsigned int num_lwps = 0;
>> +
>> +  /* Number of LWPs currently running.  */
>> +  unsigned int running_lwps = 0;
>> +
>> +  /* Have a pending SIGSTOP event that needs to be discarded.  */
>> +  bool pending_sigstop = false;
>> +};
> 
> Ok, it's not exactly what I expected, but I will keep on reading.
> 
> Long term, I don't think the resumed_lwps field will be enough to
> describe the resume state of individual threads.  Actually, to
> complement the example I gave on an earlier patch, I guess you could do
> that today?
> 
> (gdb) set scheduler-locking on
> (gdb) thread 1
> (gdb) continue & # only supposed to resume thread 1
> (gdb) thread 2
> (gdb) continue & # only supposed to resume thread 2
> 
> Here, resume_lwps would end up as (pid, thread_2_lwp, 0), right?

Ok, that answers a question I had then.  I do have a follow-up to this
I haven't posted (I mentioned it in the cover letter) where I replace
the single ptid with an unordered_set<> of LWPs belonging to the
process that should be resumed.  However, in order to make this work
I had to make all "real" resume calls deferred using commit_resumed
(but that only kind of works, I have to explicitly do commit_resumed
at the start of wait() because the core doesn't do it for me).  Basically,
in my followup the fbsd_nat::resume method just modifies state in the
per-inferior struct to keep track of at most one pending signal/step
and a set of LWPs to resume.  Then when either commit_resumed or wait
is called I walk all inferiors for the native target doing the actual
resume (PT_CONTINUE) after using PT_SUSPEND/PT_RESUME on the set of
LWPs that need to actually run for the given process.

Currently though I get new regressions with that approach compared to
this series. :(

FWIW, with this series what would happen for your example above is
that an assert() trips in fbsd_nat::resume when it tries to resume
the second thread when the process is already resumed.

>> +
>> +/* Per-inferior data key.  */
>> +
>> +static const registry<inferior>::key<fbsd_inferior_info> fbsd_inferior_data;
>>   
>>   /* If an event is triggered asynchronously (fake vfork_done events) or
>>      occurs when the core is not expecting it, a pending event is
>> @@ -95,21 +110,27 @@ have_pending_event (ptid_t filter)
>>     return false;
>>   }
>>   
>> -/* Helper method called by the target wait method.  Returns true if
>> -   there is a pending event matching resume_ptid.  If there is a
>> -   matching event, PTID and *STATUS contain the event details, and the
>> -   event is removed from the pending list.  */
>> +/* Returns true if there is a pending event for a resumed process
>> +   matching FILTER.  If there is a matching event, PTID and *STATUS
>> +   contain the event details, and the event is removed from the
>> +   pending list.  */
>>   
>>   static bool
>> -take_pending_event (ptid_t &ptid, target_waitstatus *status)
>> +take_pending_event (fbsd_nat_target *target, ptid_t filter, ptid_t &ptid,
>> +		    target_waitstatus *status)
>>   {
>>     for (auto it = pending_events.begin (); it != pending_events.end (); it++)
>> -    if (it->ptid.matches (resume_ptid))
>> +    if (it->ptid.matches (filter))
>>         {
>> -	ptid = it->ptid;
>> -	*status = it->status;
>> -	pending_events.erase (it);
>> -	return true;
>> +	inferior *inf = find_inferior_ptid (target, it->ptid);
>> +	fbsd_inferior_info *info = fbsd_inferior_data.get (inf);
>> +	if (it->ptid.matches (info->resumed_lwps))
>> +	  {
>> +	    ptid = it->ptid;
>> +	    *status = it->status;
>> +	    pending_events.erase (it);
>> +	    return true;
>> +	  }
> 
> If that code was kept as-is, I think take_pending_event should be a
> method of fbsd_nat_target, rather than passing it manually.

Ok.

>> +
>> +  if (ptid.pid () != inferior_ptid.pid ())
>> +    {
>> +      step = 0;
>> +      signo = GDB_SIGNAL_0;
>> +      gdb_assert (!ptid.lwp_p ());
> 
> I don't get why you ignore the step request here.  Perhaps it is due to
> a misundertanding of the resume interface (which was really confusing
> before Pedro's clarification)?
> 
> The comment on target_resume and the commit message on Pedro's change
> explain it well, but essentially:
> 
>   - The ptid passed as a parameter (SCOPE_PTID) is the set of threads to
>     resume.  If you keep a list of LWPs, then you can just go over that
>     list and resume everything that ptid_t::matches SCOPE_PTID, and that
>     doesn't have a pending event.
>   - STEP indicates whether to resume INFERIOR_PTID in resume mode.  If
>     STEP is 0, it means that no thread is resumed in step mode they are
>     all resumed normally.
>   - SIGNAL indicates what signal to resume INFERIOR_PTID with.  If
>     it's GDB_SIGNAL_0, it means resume without a signal.
> 
> So, the last two bullets only modify how the thread identified by
> INFERIOR_PTID is resumed.  I think that in practice, it's guaranteed in
> practice today that INFERIOR_PTID is "within" SCOPE_PTID.  But you can
> also write the code without that assumption, it should be much harder.
> 
> For threads that are not INFERIOR_PTID, I think the target should
> resume them with the signal in thread_info::m_suspend::stop_signal.  But
> that can be a problem for another day.

Mmmm, that would be good to know, that detail is not obvious.

> I'll wait for clarifications from you before continuing to read, because
> I am a bit lost with the approach taken here.

To clarify the code above, it might be helpful to note that there is now
a "resume_one_process" function that you are replying to above that is
called in a loop by the actual resume method:

void
fbsd_nat_target::resume (ptid_t scope_ptid, int step, enum gdb_signal signo)
{
   fbsd_nat_debug_start_end ("[%s], step %d, signo %d (%s)",
			    target_pid_to_str (scope_ptid).c_str (), step, signo,
			    gdb_signal_to_name (signo));

   gdb_assert (inferior_ptid.matches (scope_ptid));
   gdb_assert (!scope_ptid.tid_p ());

   if (scope_ptid == minus_one_ptid)
     {
       for (inferior *inf : all_non_exited_inferiors (this))
	resume_one_process (ptid_t (inf->pid), step, signo);
     }
   else
     {
       resume_one_process (scope_ptid, step, signo);
     }
}

The reason for the quoted code that clears the step/signal field in
resume_one_process is that I wasn't sure if I could get a "global" resume
(scope_ptid == minus_one_ptid) where step and/or signo was also set.  In
that case I wanted to be sure that I only applied to requested step/signo
to inferior_ptid and not the other processes.  That is, it's trying to
handle a case of:

- inferior_ptid == ptid(P1, T1)
- resume(minus_one_ptid, step=1)

In that case it does the loop over all inferiors, but I only want
to pass step=1 down to inf_ptrace::resume() for the inferior (process)
whose pid is P1.

If you are telling me I can write an assertion in ::resume() that is
more like:

if (step || signo != GDB_SIGNAL_0)
    gdb_assert(scope_ptid != minus_one_ptid);

Then that would mean I could avoid clearing them in resume_one_process.

-- 
John Baldwin


  reply	other threads:[~2023-03-27 21:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
2023-03-20 17:43   ` Simon Marchi
2023-03-20 18:08     ` John Baldwin
2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
2023-03-20 17:45   ` Simon Marchi
2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
2023-03-20 17:47   ` Simon Marchi
2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
2023-03-20 18:35   ` Simon Marchi
2023-03-27 20:24     ` John Baldwin
2023-03-27 20:57       ` Simon Marchi
2023-03-27 21:00       ` John Baldwin
2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
2023-03-20 19:09   ` Simon Marchi
2023-03-27 20:49     ` John Baldwin
2023-03-27 21:04       ` Simon Marchi
2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
2023-03-20 19:55   ` Simon Marchi
2023-03-27 21:13     ` John Baldwin [this message]
2023-02-28 18:18 ` [PATCH 7/9] fbsd-nat: Fix several issues with detaching John Baldwin
2023-02-28 18:18 ` [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread John Baldwin
2023-02-28 18:18 ` [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it John Baldwin
2023-03-14 18:21 ` [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9ff982d-dc96-3d7b-d128-014ed7b8e8f3@FreeBSD.org \
    --to=jhb@freebsd.org \
    --cc=gdb-patches@sourceware.org \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).