public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simark@simark.ca>
To: John Baldwin <jhb@FreeBSD.org>, gdb-patches@sourceware.org
Subject: Re: [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait.
Date: Mon, 27 Mar 2023 17:04:46 -0400	[thread overview]
Message-ID: <1bc805b6-04f7-a22a-c666-67054ad09dfa@simark.ca> (raw)
In-Reply-To: <fb0a6e52-1e39-7995-4115-471dcae4d1f3@FreeBSD.org>

On 3/27/23 16:49, John Baldwin wrote:
> On 3/20/23 12:09 PM, Simon Marchi wrote:
>> On 2/28/23 13:18, John Baldwin wrote:
>>> If wait_1 finds an event for a thread or process that does not match
>>> the set of threads and processes previously resumed, defer the event.
>>> If the event is for a specific thread, suspend the thread and continue
>>> the associated process before waiting for another event.
>>
>> I do not understand this, because it doesn't match my mental model of
>> how things should work (which is derived from working with linux-nat).
>> In that model, a thread with a pending event should never be
>> ptrace-resumed.  So the last sentence doesn't make sense, it implies
>> that a thread has a pending and is ptrace-resumed (since we suspend it).
> 
> The difference here might be that on FreeBSD ptrace can only PT_CONTINUE
> and wait() for entire processes.  When a thread contains multiple threads,
> PT_CONTINUE resumes all threads belonging to that process.  Individual
> threads can be controlled by using PT_SUSPEND on a specific LWP.  Those
> LWPs will stay asleep after PT_CONTINUE.  If any thread encounters a
> stop event (breakpoint, fork event, etc.) the entire process stops (the
> kernel signals threads in userland if needed to get them back into the
> kernel to stop).  Once it is stopped it reports an event via wait().
> 
> Currently on FreeBSD wait() only reports a single event at once.  So
> if multiple threads hit a breakpoint, wait() returns for the first one,
> PT_LWPINFO is used on the pid to find out what specific event happened
> (and which LWP it happened for).  The other thread will keep its event
> pending in the kernel, and after PT_CONTINUE it will immediately trigger
> another stop and an event reported from wait().

Thanks for the explanation.  I'll keep that in mind when re-reading
(perhaps a v2, if you want to rebase and submit a fresh version).

>>> One specific example of such an event is if a thread is created while
>>> another thread in the same process hits a breakpoint.  If the second
>>> thread's event is reported first, the target resume method does not
>>> yet "know" about the new thread and will not suspend it via
>>> PT_SUSPEND.
>>
>> I was surprised to read that the resume method suspends some threads.  I
>> don't think it should, but I do see the current code.  The current code
>> appears to interpret the resume request as: "ensure the state of thread
>> resumption looks exactly like this", so it suspends threads as needed,
>> to match the requested resumption state.  I don't think it's supposed to
>> work like that.  The resume method should only resume some threads, and
>> if the core of GDB wants some threads stopped, it will call the stop
>> method.
> 
> See above.  I can't individually suspend/resume threads.  I can only
> resume processes, but decide which set of threads within a process
> to keep resumed when resuming the entire process.

Ack.

> 
>> Anyhow, just to be sure I understand the problem you describe: when
>> fbsd-nat reports the breakpoint hit from "another thread", is the
>> "thread created" event still in the kernel, not consumed by GDB?
> 
> Yes.  What happens is that GDB tries to single-step a thread over a
> breakpoint, so it does a resume() of a single ptid_t.  To implement
> this, fbsd-nat's resume has to explicitly PT_SUSPEND all the other
> LWPs in the process before doing a PT_CONTINUE so that only the
> stepped thread will execute.  However, this logic doesn't yet know
> about the new thread (and it hasn't started executing in userland
> yet).  However, once the new thread starts executing, it immediately
> reports a ptrace event.  This confuses GDB currently becuase that
> ptrace event can be reported right after the PT_CONTINUE.  I think
> in part this is because unlike on Linux, when thread A creates thread
> B, thread A doesn't report anything.  Instead, only thread B reports
> an event when it starts.  On Linux thread creation is more like fork
> where thread A reports an event saying it created B.

Ack.

>  
>>> When wait is called, it will probably return the event
>>> from the first thread before the result of the step from second
>>> thread.  This is the case reported in PR 21497.
>>
>>> ---
>>>   gdb/fbsd-nat.c | 34 +++++++++++++++++++++++++++++++++-
>>>   1 file changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
>>> index ca278b871ef..3f7278c6ea0 100644
>>> --- a/gdb/fbsd-nat.c
>>> +++ b/gdb/fbsd-nat.c
>>> @@ -1514,7 +1514,39 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
>>>     if (is_async_p ())
>>>       async_file_flush ();
>>>   -  wptid = wait_1 (ptid, ourstatus, target_options);
>>> +  while (1)
>>> +    {
>>> +      wptid = wait_1 (ptid, ourstatus, target_options);
>>> +
>>> +      /* If no event was found, just return.  */
>>> +      if (ourstatus->kind () == TARGET_WAITKIND_IGNORE
>>> +      || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED)
>>> +    break;
>>> +
>>> +      /* If an event is reported for a thread or process while
>>> +     stepping some other thread, suspend the thread reporting the
>>> +     event and defer the event until it can be reported to the
>>> +     core.  */
>>> +      if (!wptid.matches (resume_ptid))
>>> +    {
>>> +      add_pending_event (wptid, *ourstatus);
>>> +      fbsd_nat_debug_printf ("deferring event [%s], [%s]",
>>> +                 target_pid_to_str (wptid).c_str (),
>>> +                 ourstatus->to_string ().c_str ());
>>> +      if (wptid.pid () == resume_ptid.pid ())
>>> +        {
>>> +          fbsd_nat_debug_printf ("suspending thread [%s]",
>>> +                     target_pid_to_str (wptid).c_str ());
>>> +          if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1)
>>> +        perror_with_name (("ptrace (PT_SUSPEND)"));
>>
>>
>> This is the part I don't understand.  After wait_1 returned an event for
>> a specific thread, I would expect this thread to be ptrace-stopped.  So,
>> I don't see the need to suspend it.  You'd just leave it in the "resumed
>> from the
>> until.
>>
>>> +          if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1)
>>> +        perror_with_name (("ptrace (PT_CONTINUE)"));
>>
>> I don't get why you continue `wptid.pid ()`.  What assures you that this
>> particular thread was stopped previously?  Perhaps I don't understand
>> because I assume somethings about pids/lwps and ptrace that are only
>> true on Linux.
> 
> In this specific case, here is the sequence of events:
> 
> - process P1 contains two threads, T1, and T2
> - T1 creates a new thread T3, but T3 isn't yet scheduled to execute
> - T2 hits a breakpoint so the process stops
> - wait() returns P1
> - PT_LWPINFO reports the breakpoint event for T2
> - GDB undoes the breakpoint and tries to single-step T2 over the
>   breakpoint
> - GDB calls resume with step=1 and scope_ptid=ptid(P1, T2)
> - fbsd_nat::resume iterates over the threads it knows about, but
>   it only knows about T1 and T2
>   - PT_SUSPEND T1
>   - PT_RESUME T2
> - the P1 process is now resumed via PT_CONTINUE
> - T3 starts executing and reports its "born" event
> - wait() returns P1
> - PT_LWPINFO reports the thread birth event for T3
> 
> Previously at this point fbsd_nat::wait() returned an event for T3
> which triggers the assertion failure in the PR.  With this patch
> what happens now is:
> 
> - fbsd_nat::wait() realizes T3 isn't "resumed" from the core's
>   perspective so explicitly suspends it via PT_SUSPEND
> - the P1 process is continued via PT_CONTINUE
> - T2 completes its step and reports the event
> - wait() returns P1
> - PT_LWPINFO reports the step-complete event for P2
> - the event for T2 is returned from fbsd_nat::wait() to the core

Ok, thanks, that makes sense.

Simon

  reply	other threads:[~2023-03-27 21:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-28 18:18 [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin
2023-02-28 18:18 ` [PATCH 1/9] fbsd-nat: Add missing spaces John Baldwin
2023-03-20 17:43   ` Simon Marchi
2023-03-20 18:08     ` John Baldwin
2023-02-28 18:18 ` [PATCH 2/9] fbsd-nat: Avoid a direct write to target_waitstatus::kind John Baldwin
2023-03-20 17:45   ` Simon Marchi
2023-02-28 18:18 ` [PATCH 3/9] fbsd-nat: Use correct constant for target_waitstatus::sig John Baldwin
2023-03-20 17:47   ` Simon Marchi
2023-02-28 18:18 ` [PATCH 4/9] fbsd-nat: Add a list of pending events John Baldwin
2023-03-20 18:35   ` Simon Marchi
2023-03-27 20:24     ` John Baldwin
2023-03-27 20:57       ` Simon Marchi
2023-03-27 21:00       ` John Baldwin
2023-02-28 18:18 ` [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait John Baldwin
2023-03-20 19:09   ` Simon Marchi
2023-03-27 20:49     ` John Baldwin
2023-03-27 21:04       ` Simon Marchi [this message]
2023-02-28 18:18 ` [PATCH 6/9] fbsd-nat: Fix resuming and waiting with multiple processes John Baldwin
2023-03-20 19:55   ` Simon Marchi
2023-03-27 21:13     ` John Baldwin
2023-02-28 18:18 ` [PATCH 7/9] fbsd-nat: Fix several issues with detaching John Baldwin
2023-02-28 18:18 ` [PATCH 8/9] fbsd-nat: Fix thread_alive against a running thread John Baldwin
2023-02-28 18:18 ` [PATCH 9/9] fbsd-nat: Stop a process if it is running before killing it John Baldwin
2023-03-14 18:21 ` [PING] [PATCH 0/9] Fixes for multiprocess for FreeBSD's native target John Baldwin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bc805b6-04f7-a22a-c666-67054ad09dfa@simark.ca \
    --to=simark@simark.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=jhb@FreeBSD.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).