Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'.

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Pedro Alves <palves@redhat.com>
To: Simon Marchi <simon.marchi@ericsson.com>,
	Philippe Waroquiers <philippe.waroquiers@skynet.be>,
	"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Subject: Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'.
Date: Sat, 10 Nov 2018 18:29:00 -0000	[thread overview]
Message-ID: <734d1ee3-c452-bf04-0084-26a59238a139@redhat.com> (raw)
In-Reply-To: <3830a73e-d630-1176-b6e9-00f88ed6468d@ericsson.com>

On 11/09/2018 09:44 PM, Simon Marchi wrote:
> On 2018-11-03 4:19 p.m., Philippe Waroquiers wrote:
>> Running the gdb testsuite under Valgrind started to fail after 100+ tests,
>> due to out of memory caused by lingering processes.
>>
>> The lingering processes are caused by the combination
>> of a limitation in Valgrind signal handling when using PTRACE_TRACEME
>> and a (minor) bug in GDB.
>>
>> The Valgrind limitation is : when a process is ptraced and raises
>> a signal, Valgrind will replace the raised signal by SIGSTOP as other
>> signals are masked by Valgrind when executing a system call.
>> Removing this limitation seems far to be trivial, valgrind signal
>> handling is very complex.
>>
>> Due to this valgrind limitation, GDB linux_ptrace_test_ret_to_nx gets
>> a SIGSTOP signal instead of the expected SIGTRAP or SIGSEGV.
>> In such a case, linux_ptrace_test_ret_to_nx does an early return, but
>> does not kill the child (running under valgrind), child stays in a STOP-ped
>> state.
>> These lingering processes then eat the available system memory,
>> till launching a new process starts to fail.
>>
>> This patch fixes the GDB minor bug by killing the child in case
>> linux_ptrace_test_ret_to_nx does an early return.
>>
>> nat/linux-ptrace.c has 3 different logics to kill a child process.
>> So, this patch factorizes killing a child in the function kill_child.
>>
>> The 3 different logics are:
>> * linux_ptrace_test_ret_to_nx is calling both kill (child, SIGKILL)
>>   and ptrace (PTRACE_KILL, child, ...), and then is calling once
>>   waitpid.
>> * linux_check_ptrace_features is calling ptrace (PTRACE_KILL, child, ...)
>>   + my_waitpid in a loop, as long as the waitpid status was WIFSTOPPED.
>> * linux_test_for_tracefork is calling once ptrace (PTRACE_KILL, child, ...)
>>   + my_waitpid.
>>
>> The linux ptrace documentation indicates that PTRACE_KILL is deprecated,
>> and tells to not use it, as it might return success but not kill the tracee.
>> The documentation indicates to send SIGKILL directly.
>>
>> I suspect that linux_ptrace_test_ret_to_nx calls both kill and ptrace just
>> to be sure ...
>> I suspect that linux_check_ptrace_features calls ptrace in a loop
>> to bypass the PTRACE_KILL limitation.
>> And it looks like linux_test_for_tracefork does not handle the PTRACE_KILL
>> limitation.
>> Also, 2 of the 3 logics are calling my_waitpid, which seems better,
>> as this is protecting the waitpid syscall against EINTR.
>>
>> So, the logic in kill_child is just using kill (child, SIGKILL)
>> + my_waitpid, and then does a few verifications to see everything worked
>> accordingly to the plan.
>>
>> Tested on Debian/x86_64.
>>
>> 2018-11-03  Philippe Waroquiers  <philippe.waroquiers@skynet.be>
>>
>> 	* nat/linux-ptrace.c (kill_child): New function.
>> 	(linux_ptrace_test_ret_to_nx): Use kill_child instead of local code.
>> 	Add a call to kill_child in case of early return after fork.
>> 	(linux_check_ptrace_features): Use kill_child instead of local code.
>> 	(linux_test_for_tracefork): Likewise.
> 
> This makes sense to me, but I'd like to invoke pedro(), see if he sees
> anything wrong with it.

There's some long and ugly history around PTRACE_KILL vs SIGKILL.
See gdb/linux-nat.c kill_wait_one_lwp and kill_one_lwp, and the comments
within those functions.  I don't know whether the kernels those were for
are relevant any more.  Likely using git blame and finding the corresponding
patch posts would shed some more light.  Whatever we do, it'd be nice to
make gdb/linux-nat.c use the unified code as well.

Thanks,
Pedro Alves

next prev parent reply	other threads:[~2018-11-10 18:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-03 20:19 Philippe Waroquiers
2018-11-09 22:22 ` Simon Marchi
2018-11-10 18:29   ` Pedro Alves [this message]
2018-11-11 17:24     ` Philippe Waroquiers
2018-11-25 23:13       ` Philippe Waroquiers
2018-12-10 20:46         ` PING^2 " Philippe Waroquiers
2018-12-14 23:39       ` Simon Marchi
2018-12-16 20:27         ` Philippe Waroquiers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=734d1ee3-c452-bf04-0084-26a59238a139@redhat.com \
    --to=palves@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=philippe.waroquiers@skynet.be \
    --cc=simon.marchi@ericsson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).