From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 70286 invoked by alias); 14 Dec 2018 23:39:15 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 70273 invoked by uid 89); 14 Dec 2018 23:39:15 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=killing, factorization, posts, itd X-HELO: smtp.polymtl.ca Received: from smtp.polymtl.ca (HELO smtp.polymtl.ca) (132.207.4.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 14 Dec 2018 23:39:13 +0000 Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id wBENd6bu003043 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 14 Dec 2018 18:39:11 -0500 Received: by simark.ca (Postfix, from userid 112) id AE7D81E7B5; Fri, 14 Dec 2018 18:39:06 -0500 (EST) Received: from simark.ca (localhost [127.0.0.1]) by simark.ca (Postfix) with ESMTP id C59781E4C0; Fri, 14 Dec 2018 18:39:03 -0500 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Fri, 14 Dec 2018 23:39:00 -0000 From: Simon Marchi To: Philippe Waroquiers Cc: Pedro Alves , Simon Marchi , gdb-patches@sourceware.org Subject: Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'. In-Reply-To: <1541957075.1531.1.camel@skynet.be> References: <20181103201929.10345-1-philippe.waroquiers@skynet.be> <3830a73e-d630-1176-b6e9-00f88ed6468d@ericsson.com> <734d1ee3-c452-bf04-0084-26a59238a139@redhat.com> <1541957075.1531.1.camel@skynet.be> Message-ID: <1d75a834a31293c79f8ffb6188a059d6@polymtl.ca> X-Sender: simon.marchi@polymtl.ca User-Agent: Roundcube Webmail/1.3.6 X-IsSubscribed: yes X-SW-Source: 2018-12/txt/msg00165.txt.bz2 On 2018-11-11 12:24, Philippe Waroquiers wrote: > On Sat, 2018-11-10 at 18:29 +0000, Pedro Alves wrote: >> There's some long and ugly history around PTRACE_KILL vs SIGKILL. >> See gdb/linux-nat.c kill_wait_one_lwp and kill_one_lwp, and the >> comments >> within those functions. I don't know whether the kernels those were >> for >> are relevant any more. Likely using git blame and finding the >> corresponding >> patch posts would shed some more light. Whatever we do, it'd be nice >> to >> make gdb/linux-nat.c use the unified code as well. > > I have done some digging in the source code and searched some > more info using git blame. > => I found 7 different linux logics related to kill(SIGKILL) > and ptrace(PTRACE_KILL). > I found only one identification of kernels giving problems. > ChangeLogs/commit msg were not really giving much > background information : these are telling what is being changed, > not why it is being changed. > linux-low.c has the most interesting comments explaining > the kill related problems. > > Below is some updated text with the found info (sorry, you > will have to re-read some parts already in the RFA). > > I have to admit that a 'linux-ptrace.c only factorization' > as suggested in the RFA (at least as a first step, till we have > validated with it that a simpler kill logic is ok on various platforms) > is somewhat less frightening me than reducing all 7 different > logics to a single one. > (there is a table at the end of this mail that summarises all > 7 logics). > > I must say that all this is killing me :). > Maybe I would feel better if I would just have added my own > local new 8th kill logic where I need it :). > > Philippe > > > nat/linux-ptrace.c has 3 different logics to kill a child process. > So, this patch factorizes killing a child in the function kill_child. > > The 3 different logics are: > * linux_ptrace_test_ret_to_nx is calling both kill (child, SIGKILL) >   and ptrace (PTRACE_KILL, child, ...), and then is calling once >   waitpid. > * linux_check_ptrace_features is calling ptrace (PTRACE_KILL, child, > ...) >   + my_waitpid in a loop, as long as the waitpid status was WIFSTOPPED. > * linux_test_for_tracefork is calling once ptrace (PTRACE_KILL, child, > ...) >   + my_waitpid. > > The linux ptrace documentation indicates that PTRACE_KILL is > deprecated, > and tells to not use it, as it might return success but not kill the > tracee. > The documentation indicates to send SIGKILL directly. > > I suspect that linux_ptrace_test_ret_to_nx calls both kill and ptrace > just > to be sure ... > git blame indicates that the call to kill(SIGKILL) was added in > addition to the > pre-existing ptrace PTRACE_KILL. The commit log also indicates to > ignore > PTRACE_KILL errors. But there is no explanation why adding the SIGKILL > call > was needed, and why PTRACE_KILL was kept. Not sure if you got that far in your archeology, but here is the discussion about it: https://sourceware.org/ml/archer/2011-q1/msg00102.html which is a reply to: https://sourceware.org/ml/archer/2010-q3/msg00209.html Realistically, we could probably just get rid of using PTRACE_KILL. Please push the patch. I think it's totally fine to do one small step at the time. As you said, it's already a significant improvement. Simon