From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2610:1c1:1:606c::19:2]) by sourceware.org (Postfix) with ESMTPS id 6D4673858C54 for ; Mon, 27 Mar 2023 20:49:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D4673858C54 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=FreeBSD.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [96.47.72.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits)) (Client CN "mx1.freebsd.org", Issuer "R3" (verified OK)) by mx2.freebsd.org (Postfix) with ESMTPS id 4PllLM2S56z4Gtg; Mon, 27 Mar 2023 20:49:47 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PllLM1dz0z3Ppg; Mon, 27 Mar 2023 20:49:47 +0000 (UTC) (envelope-from jhb@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1679950187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5CmH2b6uhKm/LhIsswVGce5I8ed4gkxq0ViZgnZpLto=; b=V75zVkxDUEdAfyLGiqjqlry1ozoHKIOW0KMNdcqfCY345v3TEcHZg5jc2ncCAQCpBBHsk1 DmD2chHfCRhBiET7y3WUc6Qs31+8ZsOzwD/NgnNFGLMZFQYtMiY2+gyAUGey4U2YcsGLCB LrM+wT8ydp6ZgU+6m8klLpJrArL3B9x23avHYLIrvAsOM5gSHtUoRNqkBG0MUobfzaw0KA F9NVvmhzXMclZUFA32Xkrdg76rV2fBxrlqzMErq4Q2bJUp5p613Zr7oaULVfu5i8ZXQgHx Pg35SMacmE7RVOLsgeYSFElSdUrLRNTaea+xwjCEbdyXBc0dk3GWn4tVtwC+gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1679950187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5CmH2b6uhKm/LhIsswVGce5I8ed4gkxq0ViZgnZpLto=; b=rflg5J38Bf0i7wKai9+Azytw9MBv+Ik/ujxUmjsuhDeoRoNVJCa9ulbN1GoXzotHx2xyuH nm7DoFn9cOGV5Z3ZO52hRZrZwjNlWY5yxJiy2KcrbkSlsskfQ/2iTNtzrpMf/FwniNpCpP 6Yjz+HCErb22F4upfy/bOH9nGViDBk8riMyKv5lMQ/71cb3QPFgHJtdzGJdMgh0LpAkdSM o/WXJSrOV2chj2NGArqTIptQB6NmPGiIjoE/sSyHMlT9z/4EB51rQks+N8UWx3cAUknnmH JWc3goIs2aMxanbZ9XKxQmePPKTEcZS7mcXJZA19D2sOFcDBgVN484IsGEiqSw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1679950187; a=rsa-sha256; cv=none; b=YEqNc8omN4gH9KU8n3cOE5hv8L/MA+Io4rK7qkKX21Crh7hhAjxO9a2yIkd6b8LVrI8dgm P6XbWDO1eb49l85CkuqoVFM28cgOwYkiqP8uFSRhYZUIgXZpvpQ53eq/elXCdYhTqm+4wd uUImFc5fNiMVEifVJw4jgf49okF4KqG8YSPOc9UIUmyI46Vq051pKNVmQ/8ocCJwoGdbod k7Qjo6QkJdiD3JGm2rPu8iGg/UlQIICqz+/4L89nA8ZrFBAH10HFxdlNDGhNOyKNf723SW MFLvw8peue0SkWpDqLRKngA6+o+JWYiGG/boZ9WRxqpPabZ7VheMn/l42mUXYA== Received: from [IPV6:2601:648:8680:16b0:c10c:3358:4516:c03f] (unknown [IPv6:2601:648:8680:16b0:c10c:3358:4516:c03f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id 4PllLL5nr6zLR9; Mon, 27 Mar 2023 20:49:46 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: Date: Mon, 27 Mar 2023 13:49:45 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Content-Language: en-US To: Simon Marchi , gdb-patches@sourceware.org References: <20230228181845.99936-1-jhb@FreeBSD.org> <20230228181845.99936-6-jhb@FreeBSD.org> From: John Baldwin Subject: Re: [PATCH 5/9] fbsd-nat: Defer any ineligible events reported by wait. In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 3/20/23 12:09 PM, Simon Marchi wrote: > On 2/28/23 13:18, John Baldwin wrote: >> If wait_1 finds an event for a thread or process that does not match >> the set of threads and processes previously resumed, defer the event. >> If the event is for a specific thread, suspend the thread and continue >> the associated process before waiting for another event. > > I do not understand this, because it doesn't match my mental model of > how things should work (which is derived from working with linux-nat). > In that model, a thread with a pending event should never be > ptrace-resumed. So the last sentence doesn't make sense, it implies > that a thread has a pending and is ptrace-resumed (since we suspend it). The difference here might be that on FreeBSD ptrace can only PT_CONTINUE and wait() for entire processes. When a thread contains multiple threads, PT_CONTINUE resumes all threads belonging to that process. Individual threads can be controlled by using PT_SUSPEND on a specific LWP. Those LWPs will stay asleep after PT_CONTINUE. If any thread encounters a stop event (breakpoint, fork event, etc.) the entire process stops (the kernel signals threads in userland if needed to get them back into the kernel to stop). Once it is stopped it reports an event via wait(). Currently on FreeBSD wait() only reports a single event at once. So if multiple threads hit a breakpoint, wait() returns for the first one, PT_LWPINFO is used on the pid to find out what specific event happened (and which LWP it happened for). The other thread will keep its event pending in the kernel, and after PT_CONTINUE it will immediately trigger another stop and an event reported from wait(). >> One specific example of such an event is if a thread is created while >> another thread in the same process hits a breakpoint. If the second >> thread's event is reported first, the target resume method does not >> yet "know" about the new thread and will not suspend it via >> PT_SUSPEND. > > I was surprised to read that the resume method suspends some threads. I > don't think it should, but I do see the current code. The current code > appears to interpret the resume request as: "ensure the state of thread > resumption looks exactly like this", so it suspends threads as needed, > to match the requested resumption state. I don't think it's supposed to > work like that. The resume method should only resume some threads, and > if the core of GDB wants some threads stopped, it will call the stop > method. See above. I can't individually suspend/resume threads. I can only resume processes, but decide which set of threads within a process to keep resumed when resuming the entire process. > Anyhow, just to be sure I understand the problem you describe: when > fbsd-nat reports the breakpoint hit from "another thread", is the > "thread created" event still in the kernel, not consumed by GDB? Yes. What happens is that GDB tries to single-step a thread over a breakpoint, so it does a resume() of a single ptid_t. To implement this, fbsd-nat's resume has to explicitly PT_SUSPEND all the other LWPs in the process before doing a PT_CONTINUE so that only the stepped thread will execute. However, this logic doesn't yet know about the new thread (and it hasn't started executing in userland yet). However, once the new thread starts executing, it immediately reports a ptrace event. This confuses GDB currently becuase that ptrace event can be reported right after the PT_CONTINUE. I think in part this is because unlike on Linux, when thread A creates thread B, thread A doesn't report anything. Instead, only thread B reports an event when it starts. On Linux thread creation is more like fork where thread A reports an event saying it created B. >> When wait is called, it will probably return the event >> from the first thread before the result of the step from second >> thread. This is the case reported in PR 21497. > >> --- >> gdb/fbsd-nat.c | 34 +++++++++++++++++++++++++++++++++- >> 1 file changed, 33 insertions(+), 1 deletion(-) >> >> diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c >> index ca278b871ef..3f7278c6ea0 100644 >> --- a/gdb/fbsd-nat.c >> +++ b/gdb/fbsd-nat.c >> @@ -1514,7 +1514,39 @@ fbsd_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus, >> if (is_async_p ()) >> async_file_flush (); >> >> - wptid = wait_1 (ptid, ourstatus, target_options); >> + while (1) >> + { >> + wptid = wait_1 (ptid, ourstatus, target_options); >> + >> + /* If no event was found, just return. */ >> + if (ourstatus->kind () == TARGET_WAITKIND_IGNORE >> + || ourstatus->kind () == TARGET_WAITKIND_NO_RESUMED) >> + break; >> + >> + /* If an event is reported for a thread or process while >> + stepping some other thread, suspend the thread reporting the >> + event and defer the event until it can be reported to the >> + core. */ >> + if (!wptid.matches (resume_ptid)) >> + { >> + add_pending_event (wptid, *ourstatus); >> + fbsd_nat_debug_printf ("deferring event [%s], [%s]", >> + target_pid_to_str (wptid).c_str (), >> + ourstatus->to_string ().c_str ()); >> + if (wptid.pid () == resume_ptid.pid ()) >> + { >> + fbsd_nat_debug_printf ("suspending thread [%s]", >> + target_pid_to_str (wptid).c_str ()); >> + if (ptrace (PT_SUSPEND, wptid.lwp (), NULL, 0) == -1) >> + perror_with_name (("ptrace (PT_SUSPEND)")); > > > This is the part I don't understand. After wait_1 returned an event for > a specific thread, I would expect this thread to be ptrace-stopped. So, > I don't see the need to suspend it. You'd just leave it in the "resumed > from the > until. > >> + if (ptrace (PT_CONTINUE, wptid.pid (), (caddr_t) 1, 0) == -1) >> + perror_with_name (("ptrace (PT_CONTINUE)")); > > I don't get why you continue `wptid.pid ()`. What assures you that this > particular thread was stopped previously? Perhaps I don't understand > because I assume somethings about pids/lwps and ptrace that are only > true on Linux. In this specific case, here is the sequence of events: - process P1 contains two threads, T1, and T2 - T1 creates a new thread T3, but T3 isn't yet scheduled to execute - T2 hits a breakpoint so the process stops - wait() returns P1 - PT_LWPINFO reports the breakpoint event for T2 - GDB undoes the breakpoint and tries to single-step T2 over the breakpoint - GDB calls resume with step=1 and scope_ptid=ptid(P1, T2) - fbsd_nat::resume iterates over the threads it knows about, but it only knows about T1 and T2 - PT_SUSPEND T1 - PT_RESUME T2 - the P1 process is now resumed via PT_CONTINUE - T3 starts executing and reports its "born" event - wait() returns P1 - PT_LWPINFO reports the thread birth event for T3 Previously at this point fbsd_nat::wait() returned an event for T3 which triggers the assertion failure in the PR. With this patch what happens now is: - fbsd_nat::wait() realizes T3 isn't "resumed" from the core's perspective so explicitly suspends it via PT_SUSPEND - the P1 process is continued via PT_CONTINUE - T2 completes its step and reports the event - wait() returns P1 - PT_LWPINFO reports the step-complete event for P2 - the event for T2 is returned from fbsd_nat::wait() to the core -- John Baldwin