From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119310 invoked by alias); 19 Jan 2017 13:22:31 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 119293 invoked by uid 89); 19 Jan 2017 13:22:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.1 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=defensive, paused X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 19 Jan 2017 13:22:19 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 88E5A85542; Thu, 19 Jan 2017 13:22:19 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.phx2.redhat.com [10.5.9.4]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v0JDMFpp025939; Thu, 19 Jan 2017 08:22:16 -0500 Subject: Re: [PATCH 1/2] gdbserver: catch fetch registers error To: Markus Metzger , gdb-patches@sourceware.org References: <1481039697-17596-1-git-send-email-markus.t.metzger@intel.com> <1481039697-17596-2-git-send-email-markus.t.metzger@intel.com> Cc: Daniel Jacobowitz From: Pedro Alves Message-ID: <07c4f2ec-25b7-7a42-bc57-a64045a65f9e@redhat.com> Date: Thu, 19 Jan 2017 13:22:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1481039697-17596-2-git-send-email-markus.t.metzger@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-SW-Source: 2017-01/txt/msg00378.txt.bz2 On 12/06/2016 03:54 PM, Markus Metzger wrote: > When the PTRACE_PEEKUSER ptrace request to read registers fails, gdbserer throws > an error that is caught in captured_main, where it causes a E01 error packet to > be sent and gdbserer to quit (if --once was specified) or the event loop to be > re-started (otherwise). > > We may get such ptrace errors when trying to fetch registers for an exited or > running thread. There are checks in GDB that check those conditions and throw > meaningful error messages before we could run into the above ptrace error, > e.g. thread.c:validate_registers_access. > > I ran into a new case and, rather than adding another call to > validate_registers_access in GDB, I propose to catch the error already when > handling the 'g' packet in gdbserver and reply with an error packet - assuming > that gdbserver's internal state is still intact. So there are two changes here: #1 - don't kill the debug session just because of register reading failure. I.e., handle it gracefully. The case of trying to read registers from a thread that is running is always a client bug. GDB should know which threads it resumed, and checks like the validate_registers_access you found should prevent such accesses reaching the server. I don't object to being defensive in gdbserver if that's not too invasive, but if we find such cases in GDB, we should fix them. However, the case of trying to read from an exited thread can well happen, and there's nothing the client can do to prevent it. The only thing it can do it handle it gracefully. This is because threads can transition from stopped -> exited without a "running" state in between. And this can happen if some _other_ thread of the inferior process is resumed and that other thread causes the whole process to exit (e.g., calls "exit()") just while gdb/gdbserver is accessing registers of the other supposedly-stopped thread. It can also happen when some child thread other than the one we're trying to access registers from execs, and the thread the client is trying to access registers from is not the main thread (the exec makes the kernel kill all threads, including the ones that were stopped). These sorts of corner cases are exercised by the process-dies-while-detaching.exp / process-dies-while-handling-bp.exp testcase and maybe others I don't recall their names immediately. See also: https://sourceware.org/bugzilla/show_bug.cgi?id=18749 Note process-dies-while-handling-bp.exp is largely kfailed when running against remote, exactly because we don't handle this all to well currently, in both gdbserver, and in gdb common code. #2 - sending a human readable error message back to gdb. > > To not replace a meaningful error message with E01, I'm trying to generate a > useful error message when the error is detected and the exception is thrown. > > It would look like this ... > > gdb) PASS: gdb.btrace/enable-running.exp: continue to breakpoint: cont to 44 > cont& > Continuing. > (gdb) PASS: gdb.btrace/enable-running.exp: cont& > record btrace > warning: Remote failure reply: E.Selected thread is running. > warning: Remote failure reply: E.Selected thread is running. > > ... although in this particular case, I'm going to suppress the warning. So it seems to me that btrace shouldn't even be trying to be enabled on running threads. > > To make this look a bit nicer, we could consider stripping the "E." or the > entire "Remote failure reply: E." when (re-)throwing the error inside GDB in > remote.c. Agreed. > @@ -5692,7 +5692,24 @@ fetch_register (const struct usrregs_info *usrregs, > (PTRACE_TYPE_ARG3) (uintptr_t) regaddr, (PTRACE_TYPE_ARG4) 0); > regaddr += sizeof (PTRACE_XFER_TYPE); > if (errno != 0) > - error ("reading register %d: %s", regno, strerror (errno)); > + { > + /* ESRCH could mean that the thread is not traced, exited, or is not > + stopped. */ > + if (errno == ESRCH) > + { > + struct lwp_info *lwp = get_thread_lwp (current_thread); > + > + if (!lwp_is_stopped (lwp)) > + error (_("Selected thread is running.")); > + > + if (lwp_is_marked_dead (lwp)) > + error (_("Selected thread has terminated.")); The order of the checks seems backwards -- if the lwp is dead, lwp_is_stopped is meaningless? But, if either of these conditions is true, and the errors are reached, then it looks like an internal gdbserver error to me. lwp_is_stopped, only checks the lwp->stopped flag. If !lwp->stopped, then this looks like a case of "we should not have reached here in the first place". Similarly for lwp_is_marked_dead -- if the lwp is known dead, then why did we try to ptrace it? Something higher up in the call chain should have errored out earlier, IMO. If we want gdbserver to be defensive against GDB mistakes, we can look at the request as coming from server.c, to handle a g/G packet. So in that case it's gdbserver's knowledge of gdb's perspective of whether the thread is running that counts here. I.e., a 'g' packet that tries to read from a thread that gdb resumed before should error out even if the thread happens to be momentarily paused due to some internal gdbserver event handling. That suggests to me that for the "running" case, server.c's g/G packet handling should be erroring out before calling into the backend, if: if (thread->last_resume_kind != resume_stop || thread->last_status.kind == TARGET_WAITKIND_IGNORE) See the linux-low.c:lwp_resumed function, which we could move to common code. Now, even with that in place, we'd still need to handle the unpredictable "stopped -> exited" transitions. And those can only be handled by checking for failure after the fact... But with an upfront "!running" check, then we can assume that ESRCH means the thread is gone, and thus unconditionally throw the error (_("Selected thread has terminated.")); error. Note that what gdbserver thinks of "selected" thread has no 1-1 relation with what the user thinks as selected thread, so that may be a bit confusing. E.g., the user may well have thread 2, selected, while that error triggers for some other thread because gdb decided to temporarily switch to some other thread, internally, e.g., to iterate over threads and read their registers. > + } > + > + /* Report a generic error if we could not determine the exact > + reason. */ > + error (_("Could not read register %d: %s."), regno, strerror (errno)); > + } > } > > if (the_low_target.supply_ptrace_register) > diff --git a/gdb/gdbserver/server.c b/gdb/gdbserver/server.c > index ef8dd03..3064b4f 100644 > --- a/gdb/gdbserver/server.c > +++ b/gdb/gdbserver/server.c > @@ -4132,8 +4132,22 @@ process_serial_event (void) > write_enn (own_buf); > else > { > - regcache = get_thread_regcache (current_thread, 1); > - registers_to_string (regcache, own_buf); > + TRY > + { > + regcache = get_thread_regcache (current_thread, 1); > + registers_to_string (regcache, own_buf); > + } > + CATCH (exception, RETURN_MASK_ALL) Why RETURN_MASK_ALL ? > + { > + const char *message; > + > + message = exception.message; > + if (message == NULL) > + message = _("Reading registers failed."); > + > + sprintf (own_buf, "E.%s", message); > + } > + END_CATCH > } > } > break; > Thanks, Pedro Alves