From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8855E39ACC6E; Thu, 19 Nov 2020 20:05:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8855E39ACC6E From: "palves at redhat dot com" To: gdb-prs@sourceware.org Subject: [Bug gdb/26754] Race condition when resuming threads and one does an exec Date: Thu, 19 Nov 2020 20:05:50 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: gdb X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: palves at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gdb-prs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-prs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Nov 2020 20:05:50 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26754 Pedro Alves changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |palves at redhat dot com --- Comment #3 from Pedro Alves --- Quite a nasty race. I'm not seeing how this can properly fixed without ker= nel help. The main problem comes from the fact that when PTRACE_EVENT_EXEC is reporte= d, the PID of the thread that exec has already changed to the tgid. This fact= is described here: https://github.com/strace/strace/blob/master/README-linux-ptrace ~~~ 1.x execve under ptrace. During execve, kernel destroys all other threads in the process, and resets execve'ing thread tid to tgid (process id). This looks very confusing to tracers: ~~~ Also from here, in GDB: https://sourceware.org/git/?p=3Dbinutils-gdb.git;a=3Dblob;f=3Dgdb/linux-nat= .c;h=3Df1b2c744bed9be01ff21a74e86d2a45b60d4eb65;hb=3DHEAD#l167 ~~~ Exec events =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The case of a thread group (process) with 3 or more threads, and a thread other than the leader execs is worth detailing: On an exec, the Linux kernel destroys all threads except the execing one in the thread group, and resets the execing thread's tid to the tgid. No exit notification is sent for the execing thread -- from the ptracer's perspective, it appears as though the execing thread just vanishes. Until we reap all other threads except the leader and the execing thread, the leader will be zombie, and the execing thread will be in `D (disc sleep)' state. As soon as all other threads are reaped, the execing thread changes its tid to the tgid, and the previous (zombie) leader vanishes, giving place to the "new" leader. */ ~~~ I tried to think of workarounds, like always resuming the leader thread bef= ore any other thread, but that doesn't really work, because there are many scenarios where we can't do that. Like, for example, the program has two threads, thread 1 (leader) and thread 2, and the user only resumes thread 2= .=20 And then, after a while, the user decides to resume thread 1, but that coincides exactly when thread 2 exec and changes its tid to the tgid... Bo= om, GDB ends up resuming the exec'ed thread by mistake, without processing the PTRACE_EVENT_EVENT. Always using PTRACE_SYSCALL instead of PTRACE_CONTINUE and looking for exec syscall entry doesn't work for working around this, because: #1, it stops before the exec actually happened, and the exec may fail #2, would be horribly inefficient as it would stop all threads for all syscalls, including non-exec ones. #3, can't replace PTRACE_SINGLESTEP I think that we would need a new event, similar to PTRACE_EVENT_EXEC but th= at is reported after all threads in the process are exited (and report their e= xit) and _before_ the tid of the execing thread is changed. Let's call it PTRACE_EVENT_ALMOST_EXEC. At this point, like with PTRACE_EVENT_EXEC (befo= re other threads are reaped), the previous leader would be in zombie state. T= he process had already loaded the new address space when PTRACE_EVENT_ALMOST_E= XEC is reported. So if GDB happens to try to resume the leader just while some other thread execs, it would fail to ptrace-resume it with ESRCH because it= was zombie. GDB would react to PTRACE_EVENT_ALMOST_EXEC like it reacts to PTRACE_EVENT_EXEC -- by loading the new symbols, and installing breakpoints= in the new address space. Except it wouldn't destroy the non-leader thread ye= t -- that would happen on PTRACE_EVENT_EXEC. --=20 You are receiving this mail because: You are on the CC list for the bug.=