[Bug gdb/26754] Race condition when resuming threads and one does an exec

public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed

From: "palves at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug gdb/26754] Race condition when resuming threads and one does an exec
Date: Thu, 19 Nov 2020 20:05:50 +0000	[thread overview]
Message-ID: <bug-26754-4717-l2cRbhjmTP@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-26754-4717@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=26754

Pedro Alves <palves at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |palves at redhat dot com

--- Comment #3 from Pedro Alves <palves at redhat dot com> ---
Quite a nasty race.  I'm not seeing how this can properly fixed without kernel
help.

The main problem comes from the fact that when PTRACE_EVENT_EXEC is reported,
the PID of the thread that exec has already changed to the tgid.  This fact is
described here:

 https://github.com/strace/strace/blob/master/README-linux-ptrace

 ~~~
        1.x execve under ptrace.

 During execve, kernel destroys all other threads in the process, and
 resets execve'ing thread tid to tgid (process id). This looks very
 confusing to tracers:
 ~~~

Also from here, in GDB:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=f1b2c744bed9be01ff21a74e86d2a45b60d4eb65;hb=HEAD#l167

 ~~~
 Exec events
 ===========

 The case of a thread group (process) with 3 or more threads, and a
 thread other than the leader execs is worth detailing:

 On an exec, the Linux kernel destroys all threads except the execing
 one in the thread group, and resets the execing thread's tid to the
 tgid.  No exit notification is sent for the execing thread -- from the
 ptracer's perspective, it appears as though the execing thread just
 vanishes.  Until we reap all other threads except the leader and the
 execing thread, the leader will be zombie, and the execing thread will
 be in `D (disc sleep)' state.  As soon as all other threads are
 reaped, the execing thread changes its tid to the tgid, and the
 previous (zombie) leader vanishes, giving place to the "new"
 leader.  */
 ~~~

I tried to think of workarounds, like always resuming the leader thread before
any other thread, but that doesn't really work, because there are many
scenarios where we can't do that.  Like, for example, the program has two
threads, thread 1 (leader) and thread 2, and the user only resumes thread 2. 
And then, after a while, the user decides to resume thread 1, but that
coincides exactly when thread 2 exec and changes its tid to the tgid...  Boom,
GDB ends up resuming the exec'ed thread by mistake, without processing the
PTRACE_EVENT_EVENT.

Always using PTRACE_SYSCALL instead of PTRACE_CONTINUE and looking for exec
syscall entry doesn't work for working around this, because:
  #1, it stops before the exec actually happened, and the exec may fail
  #2, would be horribly inefficient as it would stop all threads for
      all syscalls, including non-exec ones.
  #3, can't replace PTRACE_SINGLESTEP

I think that we would need a new event, similar to PTRACE_EVENT_EXEC but that
is reported after all threads in the process are exited (and report their exit)
and _before_ the tid of the execing thread is changed.  Let's call it
PTRACE_EVENT_ALMOST_EXEC.  At this point, like with PTRACE_EVENT_EXEC (before
other threads are reaped), the previous leader would be in zombie state.  The
process had already loaded the new address space when PTRACE_EVENT_ALMOST_EXEC
is reported.  So if GDB happens to try to resume the leader just while some
other thread execs, it would fail to ptrace-resume it with ESRCH because it was
zombie.  GDB would react to PTRACE_EVENT_ALMOST_EXEC like it reacts to
PTRACE_EVENT_EXEC -- by loading the new symbols, and installing breakpoints in
the new address space.  Except it wouldn't destroy the non-leader thread yet --
that would happen on PTRACE_EVENT_EXEC.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2020-11-19 20:05 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19  1:19 [Bug gdb/26754] New: " simark at simark dot ca
2020-10-19  1:20 ` [Bug gdb/26754] " simark at simark dot ca
2020-10-19  1:21 ` simark at simark dot ca
2020-11-19 20:05 ` palves at redhat dot com [this message]
2020-11-19 20:19 ` simark at simark dot ca
2020-11-19 21:32 ` simark at simark dot ca
2020-11-19 22:57 ` palves at redhat dot com
2020-11-19 23:11 ` simark at simark dot ca
2020-11-19 23:52 ` palves at redhat dot com
2020-11-19 23:55 ` simark at simark dot ca
2022-09-29 16:58 ` simark at simark dot ca

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-26754-4717-l2cRbhjmTP@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=gdb-prs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).