public inbox for gdb-prs@sourceware.org help / color / mirror / Atom feed
From: "simark at simark dot ca" <sourceware-bugzilla@sourceware.org> To: gdb-prs@sourceware.org Subject: [Bug gdb/26754] New: Race condition when resuming threads and one does an exec Date: Mon, 19 Oct 2020 01:19:24 +0000 [thread overview] Message-ID: <bug-26754-4717@http.sourceware.org/bugzilla/> (raw) https://sourceware.org/bugzilla/show_bug.cgi?id=26754 Bug ID: 26754 Summary: Race condition when resuming threads and one does an exec Product: gdb Version: HEAD Status: NEW Severity: normal Priority: P2 Component: gdb Assignee: unassigned at sourceware dot org Reporter: simark at simark dot ca Target Milestone: --- I stumbled on this while trying to write a test for when a non-leader thread displace-steps an exec syscall instruction. The thing to remember here is that on Linux, when a non-leader thread does an exec syscall, all non-main threads are removed and only the main thread starts executing the new executable (I don't know what happens in the kernel's data structure exactly, but at least that's how it looks from userspace, so that's the important part). Things go wrong when GDB tries to resume multiple threads, necessarily one after the others, and one of these threads (a non-leader one) executes an exec syscall before the main thread is resumed. I'll attach the source for a reproducer. It can be compiled with: $ gcc test.c -g3 -O0 -o test -pthread test_asm.S -fPIE I run it with $ ../gdb -q -nx --data-directory=data-directory ./test -ex "b the_syscall" -ex "b main" -ex r -ex c and then just "continue" to trigger the problem. Normally, Since it involves a race, different things can happen if you execute the reproducer multiple times. I'll describe one possible outcome. By applying this small patch to GDB, you can pretty much guarantee to have this outcome: diff --git a/gdb/infrun.c b/gdb/infrun.c index 8ae39a2877b3..450a7a37bc5b 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -2246,6 +2246,9 @@ do_target_resume (ptid_t resume_ptid, int step, enum gdb_signal sig) target_commit_resume (); + for (int i = 0; i < 500; i++) + usleep (1000); + if (target_can_async_p ()) target_async (1); } --- Note that the use of many usleep vs one sleep is because sleep otherwise gets interrupted by incoming SIGCHILD signals. I'm not sure it's needed but I just wanted to play it safe and really make GDB sleep for a while. So this is the sequence of events. Let's assume we have a process with pid 1000 and two threads with tid 1000 (the main one) and 1001 (a user-created one, which will execute the exec). Both threads are stopped. Thread 1001 is stopped just before an exec syscall. 1. User does "continue" 2. Since thread 1001 needs to initiate a displaced-step, it is resumed first (the displaced-step is not really at fault here, but having it makes it that we resume this particular thread first, so it helps trigger the issue). 3. The now-resumed thread 1001 does an exec syscall 4. The kernel deletes all non-main threads of process 1000 (so, deletes thread 1001). It sets up thread 1000 with the new executable. It sends GDB (the ptracer) a PTRACE_EVENT_EXEC. The thread is stopped as it will need GDB to continue it before it starts executing the code of the new executable. 5. GDB, still processing the "continue" command, now resumes thread 1000 - which succeeds. The thing is that the thread 1000 that GDB resumes now isn't the thread it thinks it is resuming. The thread that GDB thinks it is resuming is the one stopped somewhere in the original executable. In reality, it resumes the post-exec thread 1000, stopped on the PTRACE_EVENT_EXEC event, about to start execution of the new executable. So GDB ends up resuming this thread 1000, and all kinds of funny things can happen after that. Normally, on exec, the linux-nat target should report the exec event to the core, which would call follow_exec, which would install breakpoints in the fresh program space, among other things. None of that is done and the program is resumed, so one visible consequence is that any breakpoint set are not effective. This is what it looks like when running the reproducer: ---8<--- $ ./gdb -q -nx --data-directory=data-directory ./test -ex "b the_syscall" -ex "b main" -ex r -ex c Reading symbols from ./test... Breakpoint 1 at 0x128a: file test_asm.S, line 11. Breakpoint 2 at 0x11fb: file test.c, line 27. Starting program: /home/simark/build/binutils-gdb/gdb/test [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". Breakpoint 2, main (argc=1, argv=0x7fffffffe018) at test.c:27 27 { Continuing. [New Thread 0x7ffff7da6640 (LWP 2813391)] [Switching to Thread 0x7ffff7da6640 (LWP 2813391)] Thread 2 "test" hit Breakpoint 1, the_syscall () at test_asm.S:11 11 syscall (gdb) c <--- this is where things start to go wrong Continuing. Welcome [Thread 0x7ffff7da7740 (LWP 2813387) exited] ...hangs... --->8--- Normally, we should break on "main", but we missed it and the program ended. I think that at this point, GDB still believes there's a thread 2813391 executing that hasn't stopped. -- You are receiving this mail because: You are on the CC list for the bug.
next reply other threads:[~2020-10-19 1:19 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-19 1:19 simark at simark dot ca [this message] 2020-10-19 1:20 ` [Bug gdb/26754] " simark at simark dot ca 2020-10-19 1:21 ` simark at simark dot ca 2020-11-19 20:05 ` palves at redhat dot com 2020-11-19 20:19 ` simark at simark dot ca 2020-11-19 21:32 ` simark at simark dot ca 2020-11-19 22:57 ` palves at redhat dot com 2020-11-19 23:11 ` simark at simark dot ca 2020-11-19 23:52 ` palves at redhat dot com 2020-11-19 23:55 ` simark at simark dot ca 2022-09-29 16:58 ` simark at simark dot ca
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-26754-4717@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=gdb-prs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).