From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1292 invoked by alias); 23 Feb 2011 17:24:34 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 1280 invoked by uid 22791); 23 Feb 2011 17:24:32 -0000 X-SWARE-Spam-Status: No, hits=-6.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Date: Wed, 23 Feb 2011 17:24:00 -0000 From: Oleg Nesterov To: Jan Kratochvil Cc: Roland McGrath , archer@sourceware.org Subject: Re: safe PTRACE_ATTACH Message-ID: <20110223171610.GA28684@redhat.com> References: <20101115190537.GA15725@redhat.com> <20110215204148.GA17258@host1.dyn.jankratochvil.net> <20110215215438.CBD0E1806E0@magilla.sf.frob.com> <20110216214423.GA22228@redhat.com> <20110216220541.55E701802A2@magilla.sf.frob.com> <20110217211225.GA17768@redhat.com> <20110221193927.122901814AE@magilla.sf.frob.com> <20110222203834.GA6977@redhat.com> <20110223155135.GB30477@host1.dyn.jankratochvil.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110223155135.GB30477@host1.dyn.jankratochvil.net> User-Agent: Mutt/1.5.18 (2008-05-17) X-SW-Source: 2011-q1/txt/msg00099.txt.bz2 On 02/23, Jan Kratochvil wrote: > > notice: Moved thread to the Archer list. > > I can confirm this problem exists. > > AFAIK on recent kernels this whole "trick" (if-stopped then tkill(SIGSTOP) and > PTRACE_CONT(0)) is not needed as it now works even for `eaten-out SIGSTOP > notifications'. It is still needed, but the reason is quite different. See the test-case in http://marc.info/?l=linux-kernel&m=129676623323195 The previous reason for this bug was fixed a long ago. IOW, it is still needed in the unlikely case. But this is easy to fix (although the simple fix is not clean), and then this trick is not needed. > But to be compatible with the older kernels (despite having this race there) > what do you suggest? Checking /proc/version seems too fragile to me. > GDB could do another ptrace test (like linux_test_for_tracesysgood etc.). Oh, I do not know what would be the best check. But anyway this is "easy", I mean we can do thi somehow. The problem is, I do not see how we can modify the kernel and do not break the unmodified gdb. Oh. You know, gdb looks completely broken when it comes to jctl signals ;) Like the kernel. At least in all-stop mode. This is because... I don't know how to explain, please see the example. Absolutely trivial test-case: void *tf(void *arg) { for (;;) pause(); } int main(void) { pthread_t pt; pthread_create(&pt, NULL, tf, NULL); tf(NULL); return 0; } Now, GNU gdb (GDB) 7.1 ... (gdb) attach 29412 Attaching to program: /tmp/0/mt, process 29412 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread 0x41b54950 (LWP 29413)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 (gdb) c Continuing. lets send SIGSTOP to 29067: $ kill -STOP 29067 Program received signal SIGSTOP, Stopped (signal). 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 (gdb) very nice, but what gdb does? --- SIGCHLD (Child exited) @ 0 (0) --- wait4(-1, 0x7ffffab89b4c, WNOHANG|__WCLONE, NULL) = 0 wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG, NULL) = 29412 tkill(29412, SIG_0) = 0 tkill(29413, SIGSTOP) = 0 wait4(29413, 0x7ffffab898b4, 0, NULL) = -1 ECHILD (No child processes) wait4(29413, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WCLONE, NULL) = 29413 Note this tkill(SIGSTOP) to sub-thread! Now, (gdb) c Continuing. Program received signal SIGSTOP, Stopped (signal). 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 (gdb) c Continuing. Program received signal SIGSTOP, Stopped (signal). [Switching to Thread 0x41b54950 (LWP 29413)] 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 (gdb) c Continuing. Program received signal SIGSTOP, Stopped (signal). [Switching to Thread 0x7f00007be6f0 (LWP 29412)] 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 (gdb) c Continuing. Program received signal SIGSTOP, Stopped (signal). [Switching to Thread 0x41b54950 (LWP 29413)] 0x00000033af60e57d in pause () from /lib64/libpthread.so.0 (gdb) and so on forever. every time it does ptrace(PTRACE_CONT, 29413, 0x1, SIG_0) = 0 ptrace(PTRACE_CONT, 29412, 0x1, SIGSTOP) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG|__WCLONE, NULL) = 29413 tkill(29413, SIG_0) = 0 tkill(29412, SIGSTOP) = 0 wait4(29412, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 29412 with the obvious result. "signal SIGSTOP" (instead of "c") does work not too by the same reason. Oleg.