public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/26754] New: Race condition when resuming threads and one does an exec
@ 2020-10-19  1:19 simark at simark dot ca
  2020-10-19  1:20 ` [Bug gdb/26754] " simark at simark dot ca
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: simark at simark dot ca @ 2020-10-19  1:19 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=26754

            Bug ID: 26754
           Summary: Race condition when resuming threads and one does an
                    exec
           Product: gdb
           Version: HEAD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: simark at simark dot ca
  Target Milestone: ---

I stumbled on this while trying to write a test for when a non-leader thread
displace-steps an exec syscall instruction.  The thing to remember here is that
on Linux, when a non-leader thread does an exec syscall, all non-main threads
are  removed and only the main thread starts executing the new executable  (I
don't know what happens in the kernel's data structure exactly, but at least
that's how it looks from userspace, so that's the important part).

Things go wrong when GDB tries to resume multiple threads, necessarily one
after the others, and one of these threads (a non-leader one) executes an exec
syscall before the main thread is resumed.

I'll attach the source for a reproducer.  It can be compiled with:

  $ gcc test.c -g3 -O0 -o test -pthread test_asm.S -fPIE

I run it with

  $ ../gdb -q -nx --data-directory=data-directory ./test -ex "b the_syscall"
-ex "b main" -ex r -ex c

and then just "continue" to trigger the problem.  Normally, 

Since it involves a race, different things can happen if you execute the
reproducer multiple times.  I'll describe one possible outcome.  By applying
this small patch to GDB, you can pretty much guarantee to have this outcome:

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 8ae39a2877b3..450a7a37bc5b 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2246,6 +2246,9 @@ do_target_resume (ptid_t resume_ptid, int step, enum
gdb_signal sig)

   target_commit_resume ();

+  for (int i = 0; i < 500; i++)
+    usleep (1000);
+
   if (target_can_async_p ())
     target_async (1);
 }

---

Note that the use of many usleep vs one sleep is because sleep otherwise gets
interrupted by incoming SIGCHILD signals.  I'm not sure it's needed but I just
wanted to play it safe and really make GDB sleep for a while.

So this is the sequence of events.  Let's assume we have a process with pid
1000 and two threads with tid 1000 (the main one) and 1001 (a user-created one,
which will execute the exec).  Both threads are stopped.  Thread 1001 is
stopped just before an exec syscall.

1. User does "continue"
2. Since thread 1001 needs to initiate a displaced-step, it is resumed first
(the displaced-step is not really at fault here, but having it makes it that we
resume this particular thread first, so it helps trigger the issue).
3. The now-resumed thread 1001 does an exec syscall
4. The kernel deletes all non-main threads of process 1000 (so, deletes thread
1001).  It sets up thread 1000 with the new executable.  It sends GDB (the
ptracer) a PTRACE_EVENT_EXEC.  The thread is stopped as it will need GDB to
continue it before it starts executing the code of the new executable.
5. GDB, still processing the "continue" command, now resumes thread 1000 -
which succeeds.

The thing is that the thread 1000 that GDB resumes now isn't the thread it
thinks it is resuming.  The thread that GDB thinks it is resuming is the one
stopped somewhere in the original executable.  In reality, it resumes the
post-exec thread 1000, stopped on the PTRACE_EVENT_EXEC event, about to start
execution of the new executable.

So GDB ends up resuming this thread 1000, and all kinds of funny things can
happen after that. Normally, on exec, the linux-nat target should report the
exec event to the core, which would call follow_exec, which would install
breakpoints in the fresh program space, among other things.  None of that is
done and the program is resumed, so one visible consequence is that any
breakpoint set are not effective.

This is what it looks like when running the reproducer:

---8<---
$ ./gdb -q -nx --data-directory=data-directory ./test -ex "b the_syscall" -ex
"b main" -ex r -ex c
Reading symbols from ./test...
Breakpoint 1 at 0x128a: file test_asm.S, line 11.
Breakpoint 2 at 0x11fb: file test.c, line 27.
Starting program: /home/simark/build/binutils-gdb/gdb/test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".

Breakpoint 2, main (argc=1, argv=0x7fffffffe018) at test.c:27
27      {
Continuing.
[New Thread 0x7ffff7da6640 (LWP 2813391)]
[Switching to Thread 0x7ffff7da6640 (LWP 2813391)]

Thread 2 "test" hit Breakpoint 1, the_syscall () at test_asm.S:11
11              syscall
(gdb) c   <--- this is where things start to go wrong
Continuing.
Welcome
[Thread 0x7ffff7da7740 (LWP 2813387) exited]
...hangs...
--->8---

Normally, we should break on "main", but we missed it and the program ended.  I
 think that at this point, GDB still believes there's a thread 2813391
executing that hasn't stopped.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-29 16:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19  1:19 [Bug gdb/26754] New: Race condition when resuming threads and one does an exec simark at simark dot ca
2020-10-19  1:20 ` [Bug gdb/26754] " simark at simark dot ca
2020-10-19  1:21 ` simark at simark dot ca
2020-11-19 20:05 ` palves at redhat dot com
2020-11-19 20:19 ` simark at simark dot ca
2020-11-19 21:32 ` simark at simark dot ca
2020-11-19 22:57 ` palves at redhat dot com
2020-11-19 23:11 ` simark at simark dot ca
2020-11-19 23:52 ` palves at redhat dot com
2020-11-19 23:55 ` simark at simark dot ca
2022-09-29 16:58 ` simark at simark dot ca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).