From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 756 invoked by alias); 26 May 2014 04:55:40 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 742 invoked by uid 89); 26 May 2014 04:55:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f170.google.com Received: from mail-yk0-f170.google.com (HELO mail-yk0-f170.google.com) (209.85.160.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 26 May 2014 04:55:37 +0000 Received: by mail-yk0-f170.google.com with SMTP id 10so5810919ykt.1 for ; Sun, 25 May 2014 21:55:35 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.236.88.193 with SMTP id a41mr32672383yhf.22.1401080135157; Sun, 25 May 2014 21:55:35 -0700 (PDT) Received: by 10.170.150.70 with HTTP; Sun, 25 May 2014 21:55:35 -0700 (PDT) In-Reply-To: <1400885374-18915-1-git-send-email-donb@codesourcery.com> References: <1398885482-8449-1-git-send-email-donb@codesourcery.com> <1400885374-18915-1-git-send-email-donb@codesourcery.com> Date: Mon, 26 May 2014 04:55:00 -0000 Message-ID: Subject: Re: [PATCH 0/4 v3] Exec events in gdbserver on Linux From: Doug Evans To: Don Breazeal Cc: "gdb-patches@sourceware.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2014-05/txt/msg00625.txt.bz2 On Fri, May 23, 2014 at 3:49 PM, Don Breazeal wrote: > This patch series is an update to the gdbserver Linux exec event patches = based on review comments for the previous version. The changes from the pr= evious version are summarized below. > > [...] > > RACE CONDITION > > This section explains why the existing techniques for detecting thread ex= it weren't sufficient for gdbserver exec events, necessitating the use of P= TRACE_EVENT_EXIT. The short answer is that there is a race condition in th= e current implementation that can leave a dangling entry in the lwp list (a= n entry that doesn't have a corresponding actual lwp). In this case gdbser= ver will hang waiting for the non-existent lwp to stop. Using the exit eve= nts eliminates this race condition. > > The same race may exist in the native implementation, since the two imple= mentations are similar, but I haven't verified that. It may be difficult t= o concoct a test case that demonstrates the race since the window is so sma= ll. > > Now for the long answer: in my testing I ran into a race condition in che= ck_zombie_leaders, which detects when a thread group leader has exited and = other threads still exist. On the Linux kernel, ptrace/waitpid don't allow= reaping the leader thread until all other threads in the group are reaped.= When the leader exits, it goes zombie, but waitpid will not return an exi= t status until the other threads are gone. When a non-leader thread calls = exec, all other non-leader threads are destroyed, the leader becomes a zomb= ie, and once the "other" threads have been reaped, the execing thread takes= over the leader's pid (tgid) and appears to vanish. In order to handle th= is situation, check_zombie_leaders polls the process state in /proc and del= etes thread group leaders that are in a zombie state. The replacement is a= dded to the lwp list when the exec event is reported. > > See https://sourceware.org/ml/gdb-patches/2011-10/msg00704.html for a mor= e detailed explanation of how this works. > > Here is the relevant part of check_zombie_leaders: > > if (leader_lp !=3D NULL > /* Check if there are other threads in the group, as we may > have raced with the inferior simply exiting. */ > && !last_thread_of_process_p (leader_pid) > && linux_proc_pid_is_zombie (leader_pid)) > { > /* ...large informative comment block... */ > delete_lwp (leader_lp); > > The race occurred when there were two threads in the program, and the non= -leader thread called exec. In this case the leader thread passed through = a very brief zombie state before being replaced by the exec'ing thread as t= he thread group leader. This state transition was asynchronous, with no de= pendency on anything gdbserver did. Because there were no other threads, t= here were no thread exit events, and thus there was no synchronization with= the leader passing through the zombie state and the exec completing. If t= here had been more threads, the leader would remain in the zombie state unt= il they were waited for. In the two-thread case, sometimes the leader exit= was detected and sometimes it wasn't. (Recall that check_zombie_leaders is= polling the state, via linux_proc_pid_is_zombie. The race is between the = leader thread passing through the zombie state and check_zombie_leaders tes= ting for zombie state.) If leader exit wasn't detected, gdbserver would en= d up with a dangl > ing lwp entry that didn't correspond to any real lwp, and would hang wai= ting for that lwp to stop. Using PTRACE_EVENT_EXIT guarantees that the lea= der exit will be detected. > > Note that check_zombie_leaders works just fine for the scenarios where th= e leader thread exits and the other threads continue to run, with no exec c= alls. It is required for systems that don't support the extended ptrace ev= ents. > > The sequence of events resulting in the race condition was this: > > 1) In the program, a CLONE event for a new thread occurs. > > 2) In the program, both threads are resumed once gdbserver has > completed the new thread processing. > > 3) In gdbserver, the function linux_wait_for_event_filtered loops unt= il > waitpid returns "no more events" for the SIGCHLD generated by the > CLONE event. Then linux_wait_for_event_filtered calls > check_zombie_leaders. > > 4) In the program, the new thread is doing the exec. During the exec > the leader thread will pass through a transitory zombie state. If > there were more than two threads, the leader thread would remain a > zombie until all the non-leader, non-exec'ing threads were reaped = by > gdbserver. Since there are no such threads to reap, the leader ju= st > becomes a zombie and is replaced by the exec'ing thread on-the-fly. > (Note that it appears that the leader thread is a zombie just for a > very brief instant.) > > 5) In gdbserver, check_zombie_leaders checks whether an lwp entry > corresponds to a zombie leader thread, and if so, deletes it. Here > is the race: in (4) above, the leader may or may not be in the > transitory zombie state. In the case where a zombie isn't detecte= d, > delete_lwp is not called. > > 6) In gdbserver, an EXEC event is detected and processed. When it ge= ts > ready to report the event to GDB, it calls stop_all_lwps, which se= nds > a SIGSTOP to each lwp in the list and the waits until all the lwps= in > the list have reported a stop event. If the zombie leader wasn't > detected and processed in step (5), gdbserver blocks forever in > linux_wait_for_event_filtered, waiting for the undeleted lwp to be > stopped, which will never happen. Hi. How do I tweak your patch so that I can see the race condition for myself? [I realize the window is small ... I'd just like to play with it.] I tried just disabling PTRACE_O_TRACEEXIT but that causes segvs in the inferior (using non-ldr-exc-1 testcase). Still digging into that.