From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-136322-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 119310 invoked by alias); 19 Jan 2017 13:22:31 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 119293 invoked by uid 89); 19 Jan 2017 13:22:29 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.1 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=defensive, paused
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 19 Jan 2017 13:22:19 +0000
Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 88E5A85542;	Thu, 19 Jan 2017 13:22:19 +0000 (UTC)
Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.phx2.redhat.com [10.5.9.4])	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v0JDMFpp025939;	Thu, 19 Jan 2017 08:22:16 -0500
Subject: Re: [PATCH 1/2] gdbserver: catch fetch registers error
To: Markus Metzger <markus.t.metzger@intel.com>, gdb-patches@sourceware.org
References: <1481039697-17596-1-git-send-email-markus.t.metzger@intel.com> <1481039697-17596-2-git-send-email-markus.t.metzger@intel.com>
Cc: Daniel Jacobowitz <drow@false.org>
From: Pedro Alves <palves@redhat.com>
Message-ID: <07c4f2ec-25b7-7a42-bc57-a64045a65f9e@redhat.com>
Date: Thu, 19 Jan 2017 13:22:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <1481039697-17596-2-git-send-email-markus.t.metzger@intel.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-SW-Source: 2017-01/txt/msg00378.txt.bz2

On 12/06/2016 03:54 PM, Markus Metzger wrote:
> When the PTRACE_PEEKUSER ptrace request to read registers fails, gdbserer throws
> an error that is caught in captured_main, where it causes a E01 error packet to
> be sent and gdbserer to quit (if --once was specified) or the event loop to be
> re-started (otherwise).
> 
> We may get such ptrace errors when trying to fetch registers for an exited or
> running thread.  There are checks in GDB that check those conditions and throw
> meaningful error messages before we could run into the above ptrace error,
> e.g. thread.c:validate_registers_access.
> 
> I ran into a new case and, rather than adding another call to
> validate_registers_access in GDB, I propose to catch the error already when
> handling the 'g' packet in gdbserver and reply with an error packet - assuming
> that gdbserver's internal state is still intact.

So there are two changes here:

#1 - don't kill the debug session just because 
     of register reading failure.  I.e., handle it gracefully.

The case of trying to read registers from a thread that is
running is always a client bug.  GDB should know which threads
it resumed, and checks like the validate_registers_access
you found should prevent such accesses reaching the server.
I don't object to being defensive in gdbserver if that's
not too invasive, but if we find such cases in GDB, we should
fix them.

However, the case of trying to read from an exited thread can
well happen, and there's nothing the client can do to prevent it.
The only thing it can do it handle it gracefully.
This is because threads can transition from stopped -> exited
without a "running" state in between.  And this can happen if
some _other_ thread of the inferior process is resumed and that
other thread causes the whole process to exit (e.g., calls "exit()")
just while gdb/gdbserver is accessing registers of the other
supposedly-stopped thread.  It can also happen when some child
thread other than the one we're trying to access registers
from execs, and the thread the client is trying to access registers
from is not the main thread (the exec makes the kernel kill all
threads, including the ones that were stopped).

These sorts of corner cases are exercised by the
process-dies-while-detaching.exp / process-dies-while-handling-bp.exp
testcase and maybe others I don't recall their names immediately.

See also: https://sourceware.org/bugzilla/show_bug.cgi?id=18749

Note process-dies-while-handling-bp.exp is largely kfailed when
running against remote, exactly because we don't handle this all
to well currently, in both gdbserver, and in gdb common code.

#2 - sending a human readable error message back to gdb.

> 
> To not replace a meaningful error message with E01, I'm trying to generate a
> useful error message when the error is detected and the exception is thrown.
> 
> It would look like this ...
> 
> gdb) PASS: gdb.btrace/enable-running.exp: continue to breakpoint: cont to 44
> cont&
> Continuing.
> (gdb) PASS: gdb.btrace/enable-running.exp: cont&
> record btrace
> warning: Remote failure reply: E.Selected thread is running.
> warning: Remote failure reply: E.Selected thread is running.
> 
> ... although in this particular case, I'm going to suppress the warning.

So it seems to me that btrace shouldn't even be trying to be enabled
on running threads.

> 
> To make this look a bit nicer, we could consider stripping the "E." or the
> entire "Remote failure reply: E." when (re-)throwing the error inside GDB in
> remote.c.

Agreed.

> @@ -5692,7 +5692,24 @@ fetch_register (const struct usrregs_info *usrregs,
>  		(PTRACE_TYPE_ARG3) (uintptr_t) regaddr, (PTRACE_TYPE_ARG4) 0);
>        regaddr += sizeof (PTRACE_XFER_TYPE);
>        if (errno != 0)
> -	error ("reading register %d: %s", regno, strerror (errno));
> +	{
> +	  /* ESRCH could mean that the thread is not traced, exited, or is not
> +	     stopped.  */
> +	  if (errno == ESRCH)
> +	    {
> +	      struct lwp_info *lwp = get_thread_lwp (current_thread);
> +
> +	      if (!lwp_is_stopped (lwp))
> +		error (_("Selected thread is running."));
> +
> +	      if (lwp_is_marked_dead (lwp))
> +		error (_("Selected thread has terminated."));

The order of the checks seems backwards -- if the lwp is dead,
lwp_is_stopped is meaningless?  But, if either of these conditions
is true, and the errors are reached, then it looks like
an internal gdbserver error to me.

lwp_is_stopped, only checks the lwp->stopped flag.
If !lwp->stopped, then this looks like a case of "we should not
have reached here in the first place".

Similarly for lwp_is_marked_dead -- if the lwp is known dead,
then why did we try to ptrace it?  Something higher up in the
call chain should have errored out earlier, IMO.

If we want gdbserver to be defensive against GDB mistakes,
we can look at the request as coming from server.c, to handle a 
g/G packet.  So in that case it's gdbserver's knowledge of 
gdb's perspective of whether the thread is running that counts
here.  I.e., a 'g' packet that tries to read from a thread
that gdb resumed before should error out even if the
thread happens to be momentarily paused due to some internal
gdbserver event handling.

That suggests to me that for the "running" case, 
server.c's g/G packet handling should be erroring out before
calling into the backend, if:

  if (thread->last_resume_kind != resume_stop
      || thread->last_status.kind == TARGET_WAITKIND_IGNORE)

See the linux-low.c:lwp_resumed function, which we could
move to common code.

Now, even with that in place, we'd still need to handle
the unpredictable "stopped -> exited" transitions.  And those
can only be handled by checking for failure after the fact...
But with an upfront "!running" check, then we can assume that
ESRCH means the thread is gone, and thus unconditionally
throw the 
  error (_("Selected thread has terminated."));
error.

Note that what gdbserver thinks of "selected" thread has
no 1-1 relation with what the user thinks as selected thread,
so that may be a bit confusing.  E.g., the user may well have thread 2,
selected, while that error triggers for some other thread because
gdb decided to temporarily switch to some other thread, internally,
e.g., to iterate over threads and read their registers. 


> +	    }
> +
> +	  /* Report a generic error if we could not determine the exact
> +	     reason.  */
> +	  error (_("Could not read register %d: %s."), regno, strerror (errno));
> +	}
>      }
>  
>    if (the_low_target.supply_ptrace_register)
> diff --git a/gdb/gdbserver/server.c b/gdb/gdbserver/server.c
> index ef8dd03..3064b4f 100644
> --- a/gdb/gdbserver/server.c
> +++ b/gdb/gdbserver/server.c
> @@ -4132,8 +4132,22 @@ process_serial_event (void)
>  	    write_enn (own_buf);
>  	  else
>  	    {
> -	      regcache = get_thread_regcache (current_thread, 1);
> -	      registers_to_string (regcache, own_buf);
> +	      TRY
> +		{
> +		  regcache = get_thread_regcache (current_thread, 1);
> +		  registers_to_string (regcache, own_buf);
> +		}
> +	      CATCH (exception, RETURN_MASK_ALL)

Why RETURN_MASK_ALL ?

> +		{
> +		  const char *message;
> +
> +		  message = exception.message;
> +		  if (message == NULL)
> +		    message = _("Reading registers failed.");
> +
> +		  sprintf (own_buf, "E.%s", message);
> +		}
> +	      END_CATCH
>  	    }
>  	}
>        break;
> 

Thanks,
Pedro Alves