From: Andrew Burgess <aburgess@redhat.com>
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
Cc: tankut.baris.aktemur@intel.com
Subject: Re: [PATCHv2 5/8] gdb: don't resume vfork parent while child is still running
Date: Fri, 21 Jul 2023 10:47:04 +0100 [thread overview]
Message-ID: <87y1j9fvfr.fsf@redhat.com> (raw)
In-Reply-To: <3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca>
Simon Marchi <simark@simark.ca> writes:
> On 2023-07-04 11:22, Andrew Burgess via Gdb-patches wrote:
>> Like the last few commit, this fixes yet another vfork related issue.
>> Like the commit titled:
>>
>> gdb: don't restart vfork parent while waiting for child to finish
>>
>> which addressed a case in linux-nat where we would try to resume a
>> vfork parent, this commit addresses a very similar case, but this time
>> occurring in infrun.c. Just like with that previous commit, this bug
>> results in the assert:
>>
>> x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
>>
>> In this case the issue occurs when target-non-stop is on, but non-stop
>> is off, and again, schedule-multiple is on. As with the previous
>> commit, GDB is in follow-fork-mode child. If you have not done so, it
>> is worth reading the earlier commit as many of the problems leading to
>> the failure are the same, they just appear in a different part of GDB.
>>
>> Here are the steps leading to the assertion failure:
>>
>> 1. The user performs a 'next' over a vfork, GDB stop in the vfork
>> child,
>>
>> 2. As we are planning to follow the child GDB sets the vfork_parent
>> and vfork_child member variables in the two inferiors, the
>> thread_waiting_for_vfork_done member is left as nullptr, that member
>> is only used when GDB is planning to follow the parent inferior,
>>
>> 3. The user does 'continue', our expectation is that the vfork child
>> should resume, and once that process has exited or execd, GDB should
>> detach from the vfork parent. As a result of the 'continue' GDB
>> eventually enters the proceed function,
>>
>> 4. In proceed we selected a ptid_t to resume, because
>> schedule-multiple is on we select minus_one_ptid (see
>> user_visible_resume_ptid),
>>
>> 5. As GDB is running in all-stop on top of non-stop mode, in the
>> proceed function we iterate over all threads that match the resume
>> ptid, which turns out to be all threads, and call
>> proceed_resume_thread_checked. One of the threads we iterate over
>> is the vfork parent thread,
>>
>> 6. As the thread passed to proceed_resume_thread_checked doesn't
>> match any of the early return conditions, GDB will set the thread
>> resumed,
>>
>> 7. As we are resuming one thread at a time, this thread is seen by
>> the lower layers (e.g. linux-nat) as the "event thread", which means
>> we don't apply any of the checks, e.g. is this thread a
>> vfork parent, instead we assume that GDB core knows what it's doing,
>> and linux-nat will resume the thread, we have now incorrectly set
>> running the vfork parent thread when this thread should be waiting
>> for the vfork child to complete,
>>
>> 8. Back in the proceed function GDB continues to iterate over all
>> threads, and now (correctly) resumes the vfork child thread,
>>
>> 8. As the vfork child is still alive the kernel holds the vfork
>> parent stopped,
>>
>> 9. Eventually the child performs its exec and GDB is sent and EXECD
>> event. However, because the parent is resumed, as soon as the child
>> performs its exec the vfork parent also sends a VFORK_DONE event to
>> GDB,
>>
>> 10. Depending on timing both of these events might seem to arrive in
>> GDB at the same time. Normally GDB expects to see the EXECD or
>> EXITED/SIGNALED event from the vfork child before getting the
>> VFORK_DONE in the parent. We know this because it is as a result of
>> the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see
>> handle_vfork_child_exec_or_exit for details). Further the comment
>> in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that
>> when we remain attached to the child (not the parent) we should not
>> expect to see a VFORK_DONE,
>>
>> 11. If both events arrive at the same time then GDB will randomly
>> choose one event to handle first, in some cases this will be the
>> VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB
>> expects that (a) the vfork child has finished, however, in this case
>> this is not completely true, the child has finished, but GDB has not
>> processed the event associated with the completion yet, and (b) upon
>> seeing a VFORK_DONE GDB assumes we are remaining attached to the
>> parent, and so resumes the parent process,
>>
>> 12. GDB now handles the EXECD event. In our case we are detaching
>> from the parent, so GDB calls target_detach (see
>> handle_vfork_child_exec_or_exit),
>>
>> 13. While this has been going on the vfork parent is executing, and
>> might even exit,
>>
>> 14. In linux_nat_target::detach the first thing we do is stop all
>> threads in the process we're detaching from, the result of the stop
>> request will be cached on the lwp_info object,
>>
>> 15. In our case the vfork parent has exited though, so when GDB
>> waits for the thread, instead of a stop due to signal, we instead
>> get a thread exited status,
>>
>> 16. Later in the detach process we try to resume the threads just
>> prior to making the ptrace call to actually detach (see
>> detach_one_lwp), as part of the process to resume a thread we try to
>> touch some registers within the thread, and before doing this GDB
>> asserts that the thread is stopped,
>>
>> 17. An exited thread is not classified as stopped, and so the assert
>> triggers!
>>
>> Just like with the earlier commit, the fix is to spot the vfork parent
>> status of the thread, and not resume such threads. Where the earlier
>> commit fixed this in linux-nat, in this case I think the fix should
>> live in infrun.c, in proceed_resume_thread_checked. This function
>> already has a similar check to not resume the vfork parent in the case
>> where we are planning to follow the vfork parent, I propose adding a
>> similar case that checks for the vfork parent when we plan to follow
>> the vfork child.
>>
>> This new check will mean that at step #6 above GDB doesn't try to
>> resume the vfork parent thread, which prevents the VFORK_DONE from
>> ever arriving. Once GDB sees the EXECD/EXITED/SIGNALLED event from
>> the vfork child GDB will detach from the parent.
>>
>> There's no test included in this commit. In a subsequent commit I
>> will expand gdb.base/foll-vfork.exp which is when this bug would be
>> exposed.
>>
>> If you do want to reproduce this failure then you will for certainly
>> need to run the gdb.base/foll-vfork.exp test in a loop as the failures
>> are all very timing sensitive. I've found that running multiple
>> copies in parallel makes the failure more likely to appear, I usually
>> run ~6 copies in parallel and expect to see a failure after within
>> 10mins.
>
> Hi Andrew,
>
> Since this commit, I see this on native-gdbserver and
> native-extended-gdbserver:
>
> FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout)
> FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout)
> FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout)
> FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout)
>
> I haven't had the time to read this vfork series, but I look forward to,
> since I also did some vfork fixes not too long ago.
Thanks, I'll take a look.
Andrew
next prev parent reply other threads:[~2023-07-21 9:47 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-22 13:17 [PATCH 0/8] Some vfork related fixes Andrew Burgess
2023-06-22 13:17 ` [PATCH 1/8] gdb: catch more errors in gdb.base/foll-vfork.exp Andrew Burgess
2023-06-22 13:17 ` [PATCH 2/8] gdb: don't restart vfork parent while waiting for child to finish Andrew Burgess
2023-06-22 13:17 ` [PATCH 3/8] gdb: fix an issue with vfork in non-stop mode Andrew Burgess
2023-06-22 13:17 ` [PATCH 4/8] gdb, infrun: refactor part of `proceed` into separate function Andrew Burgess
2023-06-28 8:47 ` Aktemur, Tankut Baris
2023-07-04 15:24 ` Andrew Burgess
2023-06-22 13:17 ` [PATCH 5/8] gdb: don't resume vfork parent while child is still running Andrew Burgess
2023-06-22 13:17 ` [PATCH 6/8] gdb/testsuite: expand gdb.base/foll-vfork.exp Andrew Burgess
2023-06-22 13:17 ` [PATCH 7/8] gdb/testsuite: remove use of sleep from gdb.base/foll-vfork.exp Andrew Burgess
2023-06-22 13:17 ` [PATCH 8/8] gdb: additional debug output in infrun.c and linux-nat.c Andrew Burgess
2023-07-04 15:22 ` [PATCHv2 0/8] Some vfork related fixes Andrew Burgess
2023-07-04 15:22 ` [PATCHv2 1/8] gdb: catch more errors in gdb.base/foll-vfork.exp Andrew Burgess
2023-07-04 15:22 ` [PATCHv2 2/8] gdb: don't restart vfork parent while waiting for child to finish Andrew Burgess
2023-07-05 10:08 ` Aktemur, Tankut Baris
2023-07-04 15:22 ` [PATCHv2 3/8] gdb: fix an issue with vfork in non-stop mode Andrew Burgess
2023-07-04 15:22 ` [PATCHv2 4/8] gdb, infrun: refactor part of `proceed` into separate function Andrew Burgess
2023-07-04 15:22 ` [PATCHv2 5/8] gdb: don't resume vfork parent while child is still running Andrew Burgess
2023-07-18 20:42 ` Simon Marchi
2023-07-21 9:47 ` Andrew Burgess [this message]
2023-07-23 8:53 ` Andrew Burgess
2023-08-16 14:02 ` Andrew Burgess
2023-08-17 6:36 ` Tom de Vries
2023-08-17 7:01 ` Tom de Vries
2023-08-17 8:06 ` Tom de Vries
2023-08-17 8:22 ` Tom de Vries
2023-07-04 15:22 ` [PATCHv2 6/8] gdb/testsuite: expand gdb.base/foll-vfork.exp Andrew Burgess
2023-07-05 11:22 ` Aktemur, Tankut Baris
2023-07-04 15:22 ` [PATCHv2 7/8] gdb/testsuite: remove use of sleep from gdb.base/foll-vfork.exp Andrew Burgess
2023-07-04 15:22 ` [PATCHv2 8/8] gdb: additional debug output in infrun.c and linux-nat.c Andrew Burgess
2023-07-05 11:30 ` [PATCHv2 0/8] Some vfork related fixes Aktemur, Tankut Baris
2023-07-17 8:53 ` Andrew Burgess
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1j9fvfr.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=simark@simark.ca \
--cc=tankut.baris.aktemur@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).