From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simon.marchi@polymtl.ca>
Received: from smtp.polymtl.ca (smtp.polymtl.ca [132.207.4.11])
 by sourceware.org (Postfix) with ESMTPS id 71A183858D33
 for <gdb-patches@sourceware.org>; Wed, 13 Jan 2021 06:23:49 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 71A183858D33
Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0)
 by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id 10D6NgSd007612
 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); 
 Wed, 13 Jan 2021 01:23:47 -0500
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca 10D6NgSd007612
Received: from [10.0.0.11] (192-222-157-6.qc.cable.ebox.net [192.222.157.6])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id 5E3971EF78;
 Wed, 13 Jan 2021 01:23:42 -0500 (EST)
Subject: Re: [PATCH 08/12] prepare_for_detach and ongoing displaced stepping
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
References: <20210113011543.2047449-1-pedro@palves.net>
 <20210113011543.2047449-9-pedro@palves.net>
From: Simon Marchi <simon.marchi@polymtl.ca>
Message-ID: <ad83966e-651c-caff-8442-82e45872984c@polymtl.ca>
Date: Wed, 13 Jan 2021 01:23:42 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.6.0
MIME-Version: 1.0
In-Reply-To: <20210113011543.2047449-9-pedro@palves.net>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Poly-FromMTA: (simark.ca [158.69.221.121]) at Wed, 13 Jan 2021 06:23:42 +0000
X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Jan 2021 06:23:53 -0000

On 2021-01-12 8:15 p.m., Pedro Alves wrote:
> I noticed that "detach" while a program was running sometimes resulted
> in the process crashing.  I tracked it down to this change to
> prepare_for_detach in commit 187b041e ("gdb: move displaced stepping
> logic to gdbarch, allow starting concurrent displaced steps"):
> 
>     /* Is any thread of this process displaced stepping?  If not,
>        there's nothing else to do.  */
>  -  if (displaced->step_thread == nullptr)
>  +  if (displaced_step_in_progress (inf))
>       return;
> 
> The problem above is that the condition was inadvertently flipped.  It
> should have been:
> 
>    if (!displaced_step_in_progress (inf))
> 
> So I fixed it, and wrote a testcase to exercise it.  The testcase has
> a number of threads constantly stepping over a breakpoint, and then
> GDB detaches the process, while threads are running and stepping over
> the breakpoint.  And then I was surprised that my testcase would hang
> -- GDB would get stuck in an infinite loop in prepare_for_detach,
> here:
> 
>   while (displaced_step_in_progress (inf))
>     {
>       ...
> 
> What is going on is that since we now have two displaced stepping
> buffers, as one displaced step finishes, GDB starts another, and
> there's another one already in progress, and on and on, so the
> displaced_step_in_progress condition never turns false.  This happens
> because we go via the whole handle_inferior_event, which tries to
> start new step overs when one finishes.  And also because while we
> remove breakpoints from the target before prepare_for_detach is
> called, handle_inferior_event ends up calling insert_breakpoints via
> e.g. keep_going.
> 
> Thinking through all this, I came to the conclusion that going through
> the whole handle_inferior_event isn't ideal.  A _lot_ is done by that
> function, e.g., some thread may get a signal which is passed to the
> inferior, and gdb decides to try to get over the signal handler, which
> reinstalls breakpoints.  Or some process may exit.  We can end up
> reporting these events via normal_stop while detaching, maybe end up
> running some breakpoint commands, or maybe even something runs an
> inferior function call.  Etc.  All this after the user has already
> declared they don't want to debug the process anymore, by asking to
> detach.
> 
> I came to the conclusion that it's better to do the minimal amount of
> work possible, in a more controlled fashion, without going through
> handle_inferior_event.  So in the new approach implemented by this
> patch, if there are threads of the inferior that we're detaching in
> the middle of a displaced step, stop them, and cancel the displaced
> step.  This is basically what stop_all_threads already does, via
> wait_one and (the now factored out) handle_one, so I'm reusing those.
> 
> gdb/ChangeLog:
> 
> 	* infrun.c (struct wait_one_event): Move higher up.
> 	(prepare_for_detach): Abort in-progress displaced steps instead of
> 	letting them complete.
> 	(handle_one): If the inferior is detaching, don't add the thread
> 	back to the global step-over chain.
> 	(restart_threads): Don't restart threads if detaching.
> 	(handle_signal_stop): Remove inferior::detaching reference.
> ---
>  gdb/infrun.c | 123 +++++++++++++++++++++++++++------------------------
>  1 file changed, 65 insertions(+), 58 deletions(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index fc7ba745737..12e1564c090 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -3551,6 +3551,22 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
>    return false;
>  }
>  
> +/* An event reported by wait_one.  */
> +
> +struct wait_one_event
> +{
> +  /* The target the event came out of.  */
> +  process_stratum_target *target;
> +
> +  /* The PTID the event was for.  */
> +  ptid_t ptid;
> +
> +  /* The waitstatus.  */
> +  target_waitstatus ws;
> +};
> +
> +static bool handle_one (const wait_one_event &event);
> +
>  /* Prepare and stabilize the inferior for detaching it.  E.g.,
>     detaching while a thread is displaced stepping is a recipe for
>     crashing it, as nothing would readjust the PC out of the scratch
> @@ -3561,56 +3577,60 @@ prepare_for_detach (void)
>  {
>    struct inferior *inf = current_inferior ();
>    ptid_t pid_ptid = ptid_t (inf->pid);
> -
> -  /* Is any thread of this process displaced stepping?  If not,
> -     there's nothing else to do.  */
> -  if (displaced_step_in_progress (inf))
> -    return;
> -
> -  infrun_debug_printf ("displaced-stepping in-process while detaching");
> +  scoped_restore_current_thread restore_thread;
>  
>    scoped_restore restore_detaching = make_scoped_restore (&inf->detaching, true);
>  
> -  while (displaced_step_in_progress (inf))
> +  /* Remove all threads of INF from the global step-over chain.  We
> +     want to stop any ongoing step-over, not start any new one.  */
> +  thread_info *next;
> +  for (thread_info *tp = global_thread_step_over_chain_head;
> +       tp != nullptr;
> +       tp = next)
>      {
> -      struct execution_control_state ecss;
> -      struct execution_control_state *ecs;
> -
> -      ecs = &ecss;
> -      memset (ecs, 0, sizeof (*ecs));
> +      next = global_thread_step_over_chain_next (tp);
> +      if (tp->inf == inf)
> +	global_thread_step_over_chain_remove (tp);
> +    }
>  
> -      overlay_cache_invalid = 1;
> -      /* Flush target cache before starting to handle each event.
> -	 Target was running and cache could be stale.  This is just a
> -	 heuristic.  Running threads may modify target memory, but we
> -	 don't get any event.  */
> -      target_dcache_invalidate ();
> +  if (displaced_step_in_progress (inf))
> +    {
> +      infrun_debug_printf ("displaced-stepping in-process while detaching");
>  
> -      do_target_wait (pid_ptid, ecs, 0);
> +      /* Stop threads currently displaced stepping, aborting it.  */
>  
> -      if (debug_infrun)
> -	print_target_wait_results (pid_ptid, ecs->ptid, &ecs->ws);
> +      for (thread_info *thr : inf->non_exited_threads ())
> +	{
> +	  if (thr->displaced_step_state.in_progress ())
> +	    {
> +	      if (thr->executing)
> +		{
> +		  if (!thr->stop_requested)
> +		    {
> +		      target_stop (thr->ptid);
> +		      thr->stop_requested = true;
> +		    }
> +		}
> +	      else
> +		thr->resumed = false;
> +	    }
> +	}
>  
> -      /* If an error happens while handling the event, propagate GDB's
> -	 knowledge of the executing state to the frontend/user running
> -	 state.  */
> -      scoped_finish_thread_state finish_state (inf->process_target (),
> -					       minus_one_ptid);
> +      while (displaced_step_in_progress (inf))
> +	{
> +	  wait_one_event event;
>  
> -      /* Now figure out what to do with the result of the result.  */
> -      handle_inferior_event (ecs);
> +	  event.target = inf->process_target ();
> +	  event.ptid = do_target_wait_1 (inf, pid_ptid, &event.ws, 0);
>  
> -      /* No error, don't finish the state yet.  */
> -      finish_state.release ();
> +	  if (debug_infrun)
> +	    print_target_wait_results (pid_ptid, event.ptid, &event.ws);
>  
> -      /* Breakpoints and watchpoints are not installed on the target
> -	 at this point, and signals are passed directly to the
> -	 inferior, so this must mean the process is gone.  */
> -      if (!ecs->wait_some_more)
> -	{
> -	  restore_detaching.release ();
> -	  error (_("Program exited while detaching"));
> +	  handle_one (event);
>  	}
> +
> +      /* It's OK to leave some of the threads of INF stopped, since
> +	 they'll be detached shortly.  */
>      }
>  }

Thanks, that all LGTM.  I thought that maybe we could add a last gdb_assert
to check that there is no thread of the inferior in the global step over
chain at the end, to make sure that while handling the events we haven't
enqueued one.  But that's not so important.

Simon