From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=1ges=B6=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id CB6573858D35
	for <gdb-patches@sourceware.org>; Sat, 10 Jun 2023 10:33:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB6573858D35
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1686393200;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=eL2HzALvb04BiPoSvqIEYLNUBUtWVGwvcrr6vFAOngE=;
	b=hbFDsPWmNWCHCDyacMUBWRbXhCNIR9q06gu5xFmj1dRQcbQb5TwGGUF7mk05zHBWh4fff1
	Rqam4XEYcMTNn9Mqnu3SPlAi/Jloi+tLOD4Sat/Cg5U1FXnWM0/gpTntYNgPZKyQWuaSIq
	36KHdxUm2Yfz/tT4nzMl8qy9QK7E6tA=
Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com
 [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-510-sE2u6GGcNVGi21h201AXVQ-1; Sat, 10 Jun 2023 06:33:18 -0400
X-MC-Unique: sE2u6GGcNVGi21h201AXVQ-1
Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-30d4704fcbaso1504098f8f.0
        for <gdb-patches@sourceware.org>; Sat, 10 Jun 2023 03:33:17 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1686393196; x=1688985196;
        h=mime-version:message-id:date:references:in-reply-to:subject:to:from
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=eL2HzALvb04BiPoSvqIEYLNUBUtWVGwvcrr6vFAOngE=;
        b=lxDMZib4ZI4ooU8mI/SI1+ihq1zzDTrxifZ9YANtMQyjimwvYOHYsuy5Vlqg0BGYtw
         +afP1Iw+dAW1KmfWr1BMhQlYdxpW7ulpByZoPokpWymr6hlr+06rVSkuKWapQrjeE/UL
         w0oansWz0hWiy17qB24vN7gaACEpZ3qXzYYzONOIBXg1nxOCdYNRqLDYyVF8ovduQfeJ
         xL3cKbvsjVor4eLL+pCdDIW78a1vxTAvmJymD9IeStXXy/qpdp5ACinjMO8jOkEdI3Ew
         obYOywfKAkf9eMyGK+2owtdUQT+CKBPEi+z1xr0J5sxheRLCr7K5VGP7p4s1tiYrzg0g
         PtqQ==
X-Gm-Message-State: AC+VfDyxgQk5bHeRyvkDL/lob8xV3lWNFdJtxBChLB+2GP/fTtvrud6b
	a01IU//ErMi0qiV4RRdpqHOyzUQDvpB+blggFibVD98GZ/WXnsHqPYaCcfE+rSmGFHy5JNIbs7k
	EfrzdrPNLV29D/OgCj/ylc9TF3HxPAQ==
X-Received: by 2002:adf:e48f:0:b0:2ef:b4a9:202f with SMTP id i15-20020adfe48f000000b002efb4a9202fmr1050392wrm.69.1686393196580;
        Sat, 10 Jun 2023 03:33:16 -0700 (PDT)
X-Google-Smtp-Source: ACHHUZ7wqEKjT9ZeSOIwmveJIOb+qYKXD29IeYb5SM1MdYVELlqUMhwASJi9SD9bJI+SOhYpDfx7Kw==
X-Received: by 2002:adf:e48f:0:b0:2ef:b4a9:202f with SMTP id i15-20020adfe48f000000b002efb4a9202fmr1050380wrm.69.1686393196208;
        Sat, 10 Jun 2023 03:33:16 -0700 (PDT)
Received: from localhost (92.40.218.124.threembb.co.uk. [92.40.218.124])
        by smtp.gmail.com with ESMTPSA id d17-20020a5d6dd1000000b003095bd71159sm6849293wrz.7.2023.06.10.03.33.15
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 10 Jun 2023 03:33:15 -0700 (PDT)
From: Andrew Burgess <aburgess@redhat.com>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Subject: Re: [PATCH 25/31] Ignore failure to read PC when resuming
In-Reply-To: <20221212203101.1034916-26-pedro@palves.net>
References: <20221212203101.1034916-1-pedro@palves.net>
 <20221212203101.1034916-26-pedro@palves.net>
Date: Sat, 10 Jun 2023 11:33:14 +0100
Message-ID: <87ilbvy5cl.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Pedro Alves <pedro@palves.net> writes:

> If GDB sets a GDB_THREAD_OPTION_EXIT option on a thread, and the
> thread exits, the server reports the corresponding thread exit event,
> and forgets about the thread, i.e., removes the exited thread from its
> thread list.
>
> On the GDB side, GDB set the GDB_THREAD_OPTION_EXIT option on a
> thread, GDB delays deleting the thread from its thread list until it
> sees the corresponding thread exit event, as that event needs special
> handling in infrun.
>
> When a thread disappears from the target, but it still exists on GDB's
> thread list, in all-stop RSP mode, it can happen that GDB ends up
> trying to resume such an already-exited-thread that GDB doesn't yet
> know is gone.  When that happens, against GDBserver, typically the
> ongoing execution command fails with this error:

I'm slightly confused here.  If GDB doesn't know the thread has exited
doesn't that mean the server hasn't yet reported the exit, and so should
be holding onto the thread?

I wanted to investigate this a bit more to try and understand more about
what's going on, but I couldn't find a test that was triggering the code
added in this patch.  Do you know if there's a test I can run to see
this issue?

>
>  ...
>  PC register is not available
>  (gdb)
>
> At the remote protocol level, we may see e.g., this:
>
>       [remote] Packet received: w0;p97479.978d2
>     [remote] wait: exit
>     [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>     [infrun] print_target_wait_results:   619641.620754.0 [Thread 619641.620754],
>     [infrun] print_target_wait_results:   status->kind = THREAD_EXITED, exit_status = 0
>     [infrun] handle_inferior_event: status->kind = THREAD_EXITED, exit_status = 0
>     [infrun] context_switch: Switching context from 0.0.0 to 619641.620754.0
>     [infrun] clear_proceed_status_thread: 619641.620754.0
>
> GDB saw an exit event for thread 619641.620754.  After processing it,
> infrun decides to re-resume the target again.  To do that, infrun
> picks some other thread that isn't exited yet from GDB's perspective,
> switches to it, and calls keep_going.  Below, infrun happens to pick
> thread p97479.97479, the leader, which also exited, but GDB doesn't
> know yet:
>
> ...
>     [remote] Sending packet: $Hgp97479.97479#75
>     [remote] Packet received: OK
>     [remote] Sending packet: $g#67
>     [remote] Packet received: xxxxxxxxxxxxxxxxx (...snip...) [1120 bytes omitted]
>     [infrun] reset: reason=handling event
>     [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target remote, no resumed threads
>   [infrun] fetch_inferior_event: exit
>   PC register is not available
>   (gdb)
>
> The Linux backends, both in GDB and in GDBserver, already silently
> ignore failures to resume, with the understanding that we'll see an
> exit event soon.  Core of GDB doesn't do that yet, though.
>
> This patch is a small step in that direction.  It swallows the error
> when thrown from within resume_1.  There are likely are spots where we
> will need similar treatment, but we can tackle them as we find them.
>
> After this patch, we'll see something like this instead:
>
>     [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [640478.640478.0] at 0x0
>     [infrun] do_target_resume: resume_ptid=640478.0.0, step=0, sig=GDB_SIGNAL_0
>     [remote] Sending packet: $vCont;c:p9c5de.-1#78

I'm confuse by this example.  I would have expected it to start off with
the same intro as the above, that is, send the '$g#67' packet, get back
the xxxx...etc... but then do things differently.

>     [infrun] prepare_to_wait: prepare_to_wait
>     [infrun] reset: reason=handling event
>     [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote
>     [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote
>   [infrun] fetch_inferior_event: exit
>   [infrun] fetch_inferior_event: enter
>     [infrun] scoped_disable_commit_resumed: reason=handling event
>     [infrun] random_pending_event_thread: None found.
>     [remote] wait: enter
>       [remote] Packet received: W0;process:9c5de
>     [remote] wait: exit
>     [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
>     [infrun] print_target_wait_results:   640478.0.0 [process 640478],
>     [infrun] print_target_wait_results:   status->kind = EXITED, exit_status = 0
>     [infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0
>   [Inferior 1 (process 640478) exited normally]
>     [infrun] stop_waiting: stop_waiting
>     [infrun] reset: reason=handling event
>   (gdb) [infrun] fetch_inferior_event: exit
>
> Change-Id: I7f1c7610923435c4e98e70acc5ebe5ebbac581e2
> ---
>  gdb/infrun.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 09391d85256..21e5aa0f50e 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -2595,7 +2595,28 @@ resume_1 (enum gdb_signal sig)
>        step = false;
>      }
>  
> -  CORE_ADDR pc = regcache_read_pc (regcache);
> +  CORE_ADDR pc = 0;

I don't think we should be picking some arbitrary $pc value (0 in this
case) and just using that as a default, instead, I think it would be
better to change the type of pc to gdb::optional<CORE_ADDR>, and then
update the rest of this function to only do the $pc relevant parts if we
have a $pc value.

> +  try
> +    {
> +      pc = regcache_read_pc (regcache);
> +    }
> +  catch (const gdb_exception_error &err)
> +    {
> +      /* Swallow errors as it may be that the current thread exited
> +	 and we've haven't seen its exit status yet.  Let the
> +	 resumption continue and we'll collect the exit event
> +	 shortly.  */
> +      if (err.error == TARGET_CLOSE_ERROR)
> +	throw;
> +
> +      if (debug_infrun)
> +	{
> +	  string_file buf;
> +	  exception_print (&buf, err);
> +	  infrun_debug_printf ("resume: swallowing error: %s",
> +			       buf.string ().c_str ());
> +	}

I guess this is the best we can probably do without changing the remote
protocol. My worry would be that there could be other reasons that the
read of $pc fails, which we are now just ignoring.  It looks like you
already ran into one such case with TARGET_CLOSE_ERROR, but maybe
there's others?

It almost feels like the ideal solution would invert the logic, so we
could write:

  catch (const gdb_exception_error &err)
    {
      /* I just invent a new error type here...  */
      if (err.err != INFERIOR_EXITED_ERROR)
        throw;

      // ... etc ...
    }

To use something like this we could have the H packet send back
something other then "OK" when GDB asks to switch to a thread that has
already exited, maybe send back the stop reply could be made to work?

I say all that really just to check if you agree or not.  I think for
now I'd be happy to go with what you present here, I think the gains
this series brings to GDB are worth some rough edges that we might want
to address in the future.

Would love to hear your thoughts,

Thanks,
Andrew

> +    }
>  
>    infrun_debug_printf ("step=%d, signal=%s, trap_expected=%d, "
>  		       "current thread [%s] at %s",
> -- 
> 2.36.0