public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simon.marchi@polymtl.ca>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Cc: Simon Marchi <simon.marchi@efficios.com>
Subject: Re: [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on)
Date: Fri, 1 Apr 2022 13:28:06 -0400	[thread overview]
Message-ID: <2bf14e68-7f2f-81cc-8c40-e847a6aca1a7@polymtl.ca> (raw)
In-Reply-To: <97adba8a-5007-c017-7730-18805d9dccc7@palves.net>

> Thinking through this, one concern I had is that if the vfork child gets stuck for some reason
> before execing or exiting (hey we're a debugger, bugs happen!), then pausing all threads makes
> it so that you can end up stuck in gdb and can't get back the prompt (if the child ignores
> SIGINT, otherwise it just dies), or if you continue in the background, you
> can't interrupt the parent threads.
> 
> E.g., with your series, and:
> 
> diff --git c/gdb/testsuite/gdb.threads/vfork-multi-thread.c w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> index cd3dfcc1e65..6178de91dca 100644
> --- c/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> +++ w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> @@ -25,6 +25,8 @@ static void *vforker(void *arg)
>      /* A vfork child is not supposed to mess with the state of the program,
>         but it is helpful for the purpose of this test.  */
>      release_main = 1;
> +    while (1)
> +      sleep (1);
>      _exit(7);
>    }
>  
> @@ -41,7 +43,7 @@ static void should_break_here(void) {}
>  
>  int main()
>  {
> -
> +  signal (SIGINT, SIG_IGN);
>    pthread_t thread;
>    int ret = pthread_create(&thread, NULL, vforker, NULL);
>    assert(ret == 0);
> (END)
>  
> 
> We get:
> 
> (gdb) r
> Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7ffff7d8a700 (LWP 1713229)]
> [Detaching after vfork from child process 1713230]
> ^C^C^C^C^C^C
> (hung forever)
> 
> Or:
> 
> $ gdb -ex "set non-stop on" ...
> (gdb) r&
> Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread 
> (gdb) [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7ffff7d8a700 (LWP 1345571)]
> [Detaching after vfork from child process 1345572]
> info threads 
>   Id   Target Id                                             Frame 
> * 1    Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
>   2    Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
> (gdb) interrupt 
> *nothing*
> (gdb) interrupt 
> *nothing*
> (gdb) info threads 
>   Id   Target Id                                             Frame 
> * 1    Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
>   2    Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
> (gdb) 
> 
> 
> And then you can't quit gdb:
> 
> (gdb) q
> A debugging session is active.
> 
>         Inferior 1 [process 1345567] will be killed.
> 
> Quit anyway? (y or n) y
> 
> ^C^C^C^C
> (hung forever)

Ah yeah... that sucks.

We can find a ton of cases where we can deadlock when dealing with stuck
vfork children.  You can run the above program on the shell and try
attaching the parent, for example.

> I guess the ^C issue will be fixed by my series reworking ctrl-c, leaving
> gdb in control of the terminal at all times..

What will happen then?  GDB receives the SIGINT, then sends a SIGSTOP to
the vfork parent thread.  Do you know if a vfork parent thread can be
stopped at that point (while the vfork child thread is stuck in an
infinite loop)?

> Bah, I guess that this might be a meh/schrug, since if the parent is
> single-threaded, then you can't interrupt it either.  Though at least GDB
> doesn't get you get into the stuck state.

What do you mean "doesn't get you into the stuck state"?  I don't see
what's different here between the single and multi thread cases.

> I wonder whether a better solution would be something completely different.
> Like, say, make it so that gdb always remains attached to the child until it
> execs/exits, in which case we would no longer need to remove breakpoints from the
> parent/child, and thus we could let all the parent's threads run as well.
> 
> We actually do that for follow-fork parent / detach-on-fork-on and
> vfork -- see inferior::pending_detach.

You mean "follow-fork child" I think.  In that case, we keep the parent
attached until the child exec or exits.

I think it would be possible to do what you describe.  If we did this, I
think we could make it so we never resume a vfork parent thread, to
avoid getting into the situation where we want to interrupt / stop a
stuck vfork parent thread.  In my experience that's not possible, this
thread will not report a stop until the vfork child has exited/execed.
Although I haven't tried with SIGSTOP.


> Anyhow, onward.  With your currently solution at least the common scenario
> works, and we have a new testcase that helps whatever redesign in future.

Yeah, I think that pragamatically it's a step forward, assuming a
"collaborating" vfork children that doesn't hang.

> Despite these concerns, I'm actually happy to get this into the tree.  I think
> it's still better than what we have today.

Cool, thanks.

>> +
>> +  /* The rest of the function assumes non-stop==off and
>> +     target-non-stop==off.
>> +
>> +     If a thread is waiting for a vfork-done event, it means breakpoints are out
>> +     for this inferior (well, program space in fact).  We don't want to resume
>> +     any thread other than the one waiting for vfork done, otherwise these other
>> +     threads could miss breakpoints.  So if a thread in the resumption set is
>> +     waiting for a vfork-done event, resume only that thread.
>> +
>> +     The resumption set width depends on whether schedule-multiple is on or off.
>> +
>> +     Note that if the target_resume interface was more flexible, we could be
>> +     smarter here when schedule-multiple is on.  For example, imagine 3
>> +     inferiors with 2 threads each (1.1, 1.2, 2.1, 2.2, 3.1 and 3.2).  Threads
>> +     2.1 and 3.2 are both waiting for a vfork-done event.  Then we could ask the
>> +     target(s) to resume:
>> +
>> +      - All threads of inferior 1
>> +      - Thread 2.1
>> +      - Thread 3.2
>> +
>> +     Since we don't have that flexibility (we can only pass one ptid), just
>> +     resume the first thread waiting for a vfork-done event we find (e.g. thread
>> +     2.1).  */
> 
> We actually have the flexibility, but we don't use it.  I mean, we could use
> the commit_resumed mechanism in all-stop as well, which would let us prepare such
> a resume with multiple target_resume calls and final commit.  For a rainy day.

Ah, yeah.  We could make internal_resume_ptid return a vector of ptids,
and have do_target_resume call target_resume multiple times.  Even
without commit_resumed... that should work?  I think?

>> +  if (sched_multi)
>> +    {
>> +      for (inferior *inf : all_non_exited_inferiors ())
>> +	if (inf->thread_waiting_for_vfork_done != nullptr)
>> +	  return inf->thread_waiting_for_vfork_done->ptid;
>> +    }
>> +  else if (current_inferior ()->thread_waiting_for_vfork_done != nullptr)
>> +    return current_inferior ()->thread_waiting_for_vfork_done->ptid;
>> +
>> +  return user_visible_resume_ptid (user_step);
>>  }
>>  
>>  /* Wrapper for target_resume, that handles infrun-specific
>> @@ -3258,6 +3360,19 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal)
>>  		continue;
> 
> 
>> diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.c b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
>> new file mode 100644
>> index 000000000000..cd3dfcc1e653
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
>> @@ -0,0 +1,74 @@
>> +#include <assert.h>
> 
> Missing copyright header.

Done.

>> +#include <pthread.h>
>> +#include <stdio.h>
>> +#include <unistd.h>
>> +#include <sys/wait.h>
>> +
>> +// Start with:
>> +//
>> +//   ./gdb -nx -q --data-directory=data-directory repro -ex "tb break_here_first" -ex r -ex "b should_break_here"
>> +//
>> +// Then do "continue".
>> +//
>> +// The main thread will likely cross should_break_here while we are handling
>> +// the vfork and breakpoints are removed, therefore missing the breakpoint.
> 
> The whole file needs a pass to adjust it to GNU conventions/formatting.

Yeah, I did that after Lancelot's comments on the v2.

>> +
>> +static volatile int release_vfork = 0;
>> +static volatile int release_main = 0;
>> +
>> +static void *vforker(void *arg)
>> +{
>> +  while (!release_vfork);
> 
> I'd add some usleep(1) here, to avoid hogging the cpu.

Ok.

>> +
>> +  pid_t pid = vfork ();
>> +  if (pid == 0) {
>> +    /* A vfork child is not supposed to mess with the state of the program,
>> +       but it is helpful for the purpose of this test.  */
>> +    release_main = 1;
>> +    _exit(7);
>> +  }
>> +
>> +  int stat;
>> +  int ret = waitpid (pid, &stat, 0);
>> +  assert (ret == pid);
>> +  assert (WIFEXITED (stat));
>> +  assert (WEXITSTATUS (stat) == 7);
>> +
>> +  return NULL;
>> +}
>> +
>> +static void should_break_here(void) {}
>> +
>> +int main()
>> +{
>> +
>> +  pthread_t thread;
>> +  int ret = pthread_create(&thread, NULL, vforker, NULL);
>> +  assert(ret == 0);
>> +
>> +  /* We break here first, while the thread is stuck on `!release_fork`.  */
>> +  release_vfork = 1;
>> +
>> +  /* We set a breakpoint on should_break_here.
>> +
>> +     We then set "release_fork" from the debugger and continue.  The main
>> +     thread hangs on `!release_main` while the non-main thread vforks.  During
>> +     the window of time where the two processes have a shared address space
>> +     (after vfork, before _exit), GDB removes the breakpoints from the address
>> +     space.  During that window, only the vfork-ing thread (the non-main
>> +     thread) is frozen by the kernel.  The main thread is free to execute.  The
>> +     child process sets `release_main`, releasing the main thread. A buggy GDB
>> +     would let the main thread execute during that window, leading to the
>> +     breakpoint on should_break_here being missed.  A fixed GDB does not resume
>> +     the threads of the vforking process other than the vforking thread.  When
>> +     the vfork child exits, the fixed GDB resumes the main thread, after
>> +     breakpoints are reinserted, so the breakpoint is not missed.  */
>> +
>> +  while (!release_main);
> 
> Ditto, usleep.

I don't know about this one, if it would affect the ability of the test
to reproduce the failure.  For the failure to be reproduced, we want
that main thread to exit that loop as fast as possible once release_main
is set and dash through "should_break_here".  If we make it sleep, it
increases the chances that the vfork child calls exit before the main
thread crosses should_break_here, therefore not exposing the problem.

I gave it a try, and with a usleep I see some failures (when removing
the fix), meaning the test still does its job, so I can add it.  If I
try with a "sleep (1)" though, the failures disappear.

Simon

  reply	other threads:[~2022-04-01 17:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17 16:27 [PATCH 0/8] Some fixes for handling vfork by multi-threaded programs Simon Marchi
2022-01-17 16:27 ` [PATCH 1/8] gdb/infrun: add reason parameter to stop_all_threads Simon Marchi
2022-03-31 15:05   ` Pedro Alves
2022-03-31 15:35     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 2/8] gdb/linux-nat: remove check based on current_inferior in linux_handle_extended_wait Simon Marchi
2022-03-31 16:12   ` Pedro Alves
2022-03-31 17:06     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 3/8] gdb: replace inferior::waiting_for_vfork_done with inferior::thread_waiting_for_vfork_done Simon Marchi
2022-03-31 18:17   ` Pedro Alves
2022-04-01 14:25     ` Simon Marchi
2022-01-17 16:27 ` [PATCH 4/8] gdb/infrun: add inferior parameters to stop_all_threads and restart_threads Simon Marchi
2022-03-31 18:17   ` Pedro Alves
2022-01-17 16:27 ` [PATCH 5/8] gdb/infrun: add logging statement to do_target_resume Simon Marchi
2022-03-31 18:18   ` Pedro Alves
2022-01-17 16:27 ` [PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) Simon Marchi
2022-03-31 18:21   ` Pedro Alves
2022-04-01 17:28     ` Simon Marchi [this message]
2022-01-17 16:27 ` [PATCH 7/8] gdbserver: report correct status in thread stop race condition Simon Marchi
2022-03-31 18:21   ` Pedro Alves
2022-01-17 16:27 ` [PATCH 8/8] gdb: resume ongoing step after handling fork or vfork Simon Marchi
2022-03-31 18:22   ` Pedro Alves
2022-03-31 18:28   ` Pedro Alves
2022-04-01 18:42     ` Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2bf14e68-7f2f-81cc-8c40-e847a6aca1a7@polymtl.ca \
    --to=simon.marchi@polymtl.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    --cc=simon.marchi@efficios.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).