On 2014-12-16 11:53 AM, Pedro Alves wrote: > ... instead of relying on libthread_db. > > I wrote a test that attaches to a program that constantly spawns > short-lived threads, which exposed several issues. This is one of > them. > > On Linux, we need to attach to all threads of a process (thread group) > individually. We currently rely on libthread_db to list the threads, > but that is problematic, because libthread_db relies on reading data > structures out of the inferior (which may well be corrupted). If > threads are being created or exiting just while we try to attach, we > may trip on inconsistencies in the inferior's thread list. To work > around that, when we see a seemingly corrupt list, we currently retry > a few times: > > static void > thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new) > { > ... > if (until_no_new) > { > /* Require 4 successive iterations which do not find any new threads. > The 4 is a heuristic: there is an inherent race here, and I have > seen that 2 iterations in a row are not always sufficient to > "capture" all threads. */ > ... > > That heuristic may well fail, and when it does, we end up with threads > in the program that aren't under GDB's control. That's obviously bad > and results in quite mistifying failures, like e.g., the process dying > for seeminly no reason when a thread that wasn't attached trips on a > breakpoint. > > There's really no reason to rely on libthread_db for this nowadays > when we have /proc mounted. In that case, which is the usual case, we > can list the LWPs from /proc/PID/task/. In fact, GDBserver is already > doing this. The patch factors out that code that knows to walk the > task/ directory out of GDBserver, and makes GDB use it too. > > Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for > them to stop immediately. Instead, we just tag the LWP as having an > expected stop. Because we can only set the ptrace options when the > thread stops, we need a new flag in the lwp structure to keep track of > whether we've already set the ptrace options, just like in GDBserver. > Note that nothing issues any ptrace command to the threads between the > PTRACE_ATTACH and the stop, so this is safe (unlike one scenario > described in gdbserver's linux-low.c). > > When we attach to a program that has threads exiting while we attach, > it's easy to race with a thread just exiting as we try to attach to > it, like: > > #1 - get current list of threads > #2 - attach to each listed thread > #3 - ooops, attach failed, thread is already gone > > As this is pretty normal, we shouldn't be issuing a scary warning in > step #3. > > When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes > we'll see EPERM as well. That happens when the kernel still has the > kernel in its task list, but the thread is marked as dead. "still has the kernel" -> "still has the thread" > Unfortunately, EPERM is ambiguous and we'll get it also on other > scenarios where the thread isn't dead, and in those cases, it's useful > to get a warning. To distiguish the cases, when we get an EPERM > failure, we open /proc/PID/status, and check the thread's state -- if > the /proc file no longer exists, or the state is "Z (Zombie)" or "X > (Dead)", we ignore the EPERM error silently; otherwise, we'll warn. > Unfortunately, there seems to be a kernel race here. Sometimes I get > EPERM, and then the /proc state still indicates "R (Running)"... If > we wait a bit and retry, we do end up seeing X or Z state, or get an > ESRCH. I thought of making GDB retry the attach a few times, but even > with a 500ms wait and 4 retries, I still see the warning sometimes. I > haven't been able to identify the kernel path that causes this yet, > but in any case, it looks like a kernel bug to me. As this just > results failure to suppress a warning that we've been printing since > about forever anyway, I'm just making the test cope with it, and issue > an XFAIL. > > gdb/gdbserver/ > 2014-12-16 Pedro Alves > > * linux-low.c (linux_attach_fail_reason_string): Move to > nat/linux-ptrace.c, and rename. > (linux_attach_lwp): Update comment. > (attach_proc_task_lwp_callback): New function. > (linux_attach): Adjus to rename and use Adjus -> Adjust > linux_proc_attach_tgid_threads. > (linux_attach_fail_reason_string): Delete declaration. > > gdb/ > 2014-12-16 Pedro Alves > > * linux-nat.c (attach_proc_task_lwp_callback): New function. > (linux_nat_attach): Use linux_proc_attach_tgid_threads. > (wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's > ptrace option flags. > * linux-nat.h (struct lwp_info) : New > field. > * nat/linux-procfs.c: Include . > (linux_proc_get_int): New parameter "warn". Handle it. > (linux_proc_get_tgid): Adjust. > (linux_proc_get_tracerpid): Rename to ... > (linux_proc_get_tracerpid_nowarn): ... this. > (linux_proc_pid_get_state): New function, factored out from > (linux_proc_pid_has_state): ... this. Add new parameter "warn" > and handle it. > (linux_proc_pid_is_gone): New function. > (linux_proc_pid_is_stopped): Adjust. > (linux_proc_pid_is_zombie_maybe_warn) > (linux_proc_pid_is_zombie_nowarn): New functions. > (linux_proc_pid_is_zombie): Use > linux_proc_pid_is_zombie_maybe_warn. > (linux_proc_attach_tgid_threads): New function. > * nat/linux-procfs.h (linux_proc_get_tgid): Update comment. > (linux_proc_get_tracerpid): Rename to ... > (linux_proc_get_tracerpid_nowarn): ... this, and update comment. > (linux_proc_pid_is_gone): New declaration. > (linux_proc_pid_is_zombie): Update comment. > (linux_proc_pid_is_zombie_nowarn): New declaration. > (linux_proc_attach_lwp_func): New typedef. > (linux_proc_attach_tgid_threads): New declaration. > * nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to > use nowarn functions. > (linux_ptrace_attach_fail_reason_string): Move here from > gdbserver/linux-low.c and rename. > (ptrace_supports_feature): If the current ptrace options are not > known yet, check them now, instead of asserting. > * nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string): > Declare. I think it makes sense, not that I know anything about it. Simon