From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 8855E39ACC6E; Thu, 19 Nov 2020 20:05:50 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8855E39ACC6E
From: "palves at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug gdb/26754] Race condition when resuming threads and one does an
 exec
Date: Thu, 19 Nov 2020 20:05:50 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gdb
X-Bugzilla-Component: gdb
X-Bugzilla-Version: HEAD
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: palves at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-26754-4717-l2cRbhjmTP@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-26754-4717@http.sourceware.org/bugzilla/>
References: <bug-26754-4717@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gdb-prs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-prs mailing list <gdb-prs.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-prs>,
 <mailto:gdb-prs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-prs/>
List-Help: <mailto:gdb-prs-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-prs>,
 <mailto:gdb-prs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Nov 2020 20:05:50 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D26754

Pedro Alves <palves at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |palves at redhat dot com
--- Comment #3 from Pedro Alves <palves at redhat dot com> ---
Quite a nasty race.  I'm not seeing how this can properly fixed without ker=
nel
help.

The main problem comes from the fact that when PTRACE_EVENT_EXEC is reporte=
d,
the PID of the thread that exec has already changed to the tgid.  This fact=
 is
described here:

 https://github.com/strace/strace/blob/master/README-linux-ptrace

 ~~~
        1.x execve under ptrace.

 During execve, kernel destroys all other threads in the process, and
 resets execve'ing thread tid to tgid (process id). This looks very
 confusing to tracers:
 ~~~

Also from here, in GDB:


https://sourceware.org/git/?p=3Dbinutils-gdb.git;a=3Dblob;f=3Dgdb/linux-nat=
.c;h=3Df1b2c744bed9be01ff21a74e86d2a45b60d4eb65;hb=3DHEAD#l167

 ~~~
 Exec events
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 The case of a thread group (process) with 3 or more threads, and a
 thread other than the leader execs is worth detailing:

 On an exec, the Linux kernel destroys all threads except the execing
 one in the thread group, and resets the execing thread's tid to the
 tgid.  No exit notification is sent for the execing thread -- from the
 ptracer's perspective, it appears as though the execing thread just
 vanishes.  Until we reap all other threads except the leader and the
 execing thread, the leader will be zombie, and the execing thread will
 be in `D (disc sleep)' state.  As soon as all other threads are
 reaped, the execing thread changes its tid to the tgid, and the
 previous (zombie) leader vanishes, giving place to the "new"
 leader.  */
 ~~~


I tried to think of workarounds, like always resuming the leader thread bef=
ore
any other thread, but that doesn't really work, because there are many
scenarios where we can't do that.  Like, for example, the program has two
threads, thread 1 (leader) and thread 2, and the user only resumes thread 2=
.=20
And then, after a while, the user decides to resume thread 1, but that
coincides exactly when thread 2 exec and changes its tid to the tgid...  Bo=
om,
GDB ends up resuming the exec'ed thread by mistake, without processing the
PTRACE_EVENT_EVENT.

Always using PTRACE_SYSCALL instead of PTRACE_CONTINUE and looking for exec
syscall entry doesn't work for working around this, because:
  #1, it stops before the exec actually happened, and the exec may fail
  #2, would be horribly inefficient as it would stop all threads for
      all syscalls, including non-exec ones.
  #3, can't replace PTRACE_SINGLESTEP

I think that we would need a new event, similar to PTRACE_EVENT_EXEC but th=
at
is reported after all threads in the process are exited (and report their e=
xit)
and _before_ the tid of the execing thread is changed.  Let's call it
PTRACE_EVENT_ALMOST_EXEC.  At this point, like with PTRACE_EVENT_EXEC (befo=
re
other threads are reaped), the previous leader would be in zombie state.  T=
he
process had already loaded the new address space when PTRACE_EVENT_ALMOST_E=
XEC
is reported.  So if GDB happens to try to resume the leader just while some
other thread execs, it would fail to ptrace-resume it with ESRCH because it=
 was
zombie.  GDB would react to PTRACE_EVENT_ALMOST_EXEC like it reacts to
PTRACE_EVENT_EXEC -- by loading the new symbols, and installing breakpoints=
 in
the new address space.  Except it wouldn't destroy the non-leader thread ye=
t --
that would happen on PTRACE_EVENT_EXEC.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=