From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=SaMF=DA=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 3FD8C3858C52
	for <gdb-patches@sourceware.org>; Fri, 14 Jul 2023 19:52:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3FD8C3858C52
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1689364336;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=tVFR9sZZi/orOtKYMOLFQb3tHQkrzrCBgeMq1of27b8=;
	b=SxJney7O7x5O4HHeYwtihcLD3hp9zWgXsbeTrf7mr4Fqf+n8GCDEtpIvexAyrW83Ip/9l/
	DAviw6d4lg1xr1nWHGSZlEIoLMBLeC6ImiEbKZyacAeHlNTEovow65YmaESxxyRUmUBWhh
	0YeS6aNjKv6xUTd4rie4UB9mFc96cBo=
Received: from mail-ua1-f69.google.com (mail-ua1-f69.google.com
 [209.85.222.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-373-LIbLmr7xNM-uJehh7ISz-A-1; Fri, 14 Jul 2023 15:52:15 -0400
X-MC-Unique: LIbLmr7xNM-uJehh7ISz-A-1
Received: by mail-ua1-f69.google.com with SMTP id a1e0cc1a2514c-7972e35d357so463497241.2
        for <gdb-patches@sourceware.org>; Fri, 14 Jul 2023 12:52:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689364334; x=1691956334;
        h=mime-version:message-id:date:references:in-reply-to:subject:to:from
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=tVFR9sZZi/orOtKYMOLFQb3tHQkrzrCBgeMq1of27b8=;
        b=FXCELjj5lpesOscy1yjgFArhLKeOmaeh5bRvLDEalVSkBQQh4KzsSfHYRA2DvMLn4N
         c8hE1pPG80kkdPWJDe/mBYuJiqZzi5BeF0pBt2b69M9U04ZbmeAsDEW4iJyKnUPNAACT
         eCD9IgT/F24lYhf24B+T9uNTGget6qPpQKykG21C1vJsxbRVlOJ1jx5d4PCkzz6wPlZi
         nakYGFZnUpbsYRGQiBJ/jchIom8h3AtSUDm/hqO0a/J9lNOG5lc863PSr7Q6FXwOEIMK
         7E3kNuuZgBa9oQAdRVBfZa8x2YAVnOSjx7HAcBTTksRGdWPqglIO/BO9czLlr4Xvk7LY
         78ng==
X-Gm-Message-State: ABy/qLb3GyuZko4c+P0eeMa3M6XwRzciNWzJ77io6UVqWnTuflmd540H
	18mCQacBbwDqkpN8vjj9qvGwYDiE62/v0F9xiq8pHA7rWTZVaOYvE8T1NTVDaHLlOSiKofDYqLA
	qp2GwDQyyWOkhmYUNCBfhtsf0EPC2Hg==
X-Received: by 2002:a05:6102:51a:b0:445:1e73:3742 with SMTP id l26-20020a056102051a00b004451e733742mr4170605vsa.4.1689364334398;
        Fri, 14 Jul 2023 12:52:14 -0700 (PDT)
X-Google-Smtp-Source: APBJJlH8DNNNQVJnJ6IQfqeb8xFrxa0RgPmEXOTH0J/s3IXlFoic+VlAV6S0gih+c10dxn+RJkncxg==
X-Received: by 2002:a05:6102:51a:b0:445:1e73:3742 with SMTP id l26-20020a056102051a00b004451e733742mr4170592vsa.4.1689364333907;
        Fri, 14 Jul 2023 12:52:13 -0700 (PDT)
Received: from localhost (2.72.115.87.dyn.plus.net. [87.115.72.2])
        by smtp.gmail.com with ESMTPSA id z17-20020a0cda91000000b006263a9e7c63sm4167624qvj.104.2023.07.14.12.52.13
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 14 Jul 2023 12:52:13 -0700 (PDT)
From: Andrew Burgess <aburgess@redhat.com>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Subject: Re: [PATCHv6 3/6] gdb: add timeouts for inferior function calls
In-Reply-To: <87jzv2ikoc.fsf@redhat.com>
References: <cover.1680530116.git.aburgess@redhat.com>
 <2550eb8f3e77778e95bf8ded2775a31d9502f89a.1680530116.git.aburgess@redhat.com>
 <4267025a-c07d-0d82-4ea6-1638e2aeff9e@palves.net>
 <87jzv2ikoc.fsf@redhat.com>
Date: Fri, 14 Jul 2023 20:52:11 +0100
Message-ID: <87h6q6i844.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Andrew Burgess <aburgess@redhat.com> writes:

> Pedro Alves <pedro@palves.net> writes:
>
>> On 2023-04-03 15:01, Andrew Burgess via Gdb-patches wrote:
>>
>>> diff --git a/gdb/NEWS b/gdb/NEWS
>>> index 10a1a70fa52..70987994e7b 100644
>>> --- a/gdb/NEWS
>>> +++ b/gdb/NEWS
>>> @@ -96,6 +96,24 @@ info main
>>>     $2 = 1
>>>     (gdb) break func if $_shell("some command") == 0
>>>  
>>> +set direct-call-timeout SECONDS
>>> +show direct-call-timeout
>>> +set indirect-call-timeout SECONDS
>>> +show indirect-call-timeout
>>> +  These new settings can be used to limit how long GDB will wait for
>>> +  an inferior function call to complete.  The direct timeout is used
>>> +  for inferior function calls from e.g. 'call' and 'print' commands,
>>> +  while the indirect timeout is used for inferior function calls from
>>> +  within a conditional breakpoint expression.
>>
>> What happens with expressions in other commands, basically any command that
>> accepts an expression?  For example, "x foo()".  Are those direct, or
>> indirect?  I assume direct?
>
> Correct, these would be direct.  I struggled to come up with a good
> name, but the basic idea was:
>
>   direct -- user enters a command and as a result GDB performs an
>             inferior function call.  The user can only enter the next
>             command once the first command (and hence inferior call) has
>             completed.
>
>   indirect -- user enters a command that accepts an expression,
>               e.g. breakpoint condition, but the expression is only
>               evaluated at some future time which is largely outside of
>               the users control, e.g. when the inferior hits the
>               breakpoint.  The user might not even be aware that the
>               inferior call is taking place (as b/p conditions are not
>               announced until they complete or timeout).
>
>>
>> I wonder whether you have plans/ideas for other kinds of indirect calls.
>> Just thinking about whether naming the option as something about
>> "breakpoint-condition" wouldn't be better by being more direct (ah!) and
>> to the point, while leaving the possibility of other kinds of situations
>> having different timeouts.  to avoid long command names, we could have
>> a prefix setting, like:
>
> I guess we could, but I'm not sure why a user might want such fine
> grained control -- they want to limit how long a breakpoint condition
> can take to evaluate, but want a different limit on some-other indirect
> case.  This just seems overly complex, surely you'd just pick a timeout
> that satisfies your expected worst case and go with that.
>
> To be honest, the reason I initially split direct and indirect is so
> that the direct case could be unlimited to match GDB's current
> behaviour.  But, now I've written it, I do think there's an argument
> that a user might want to allow direct calls to take longer.  In the
> direct case the user is (hopefully) aware that an inferior call has
> taken place, and can manually interrupt if the call is taking too long,
> so I think this split does make sense.
>
>>
>>  set call-timeout direct  # maybe there's a better name for this.
>>  set call-timeout breakpoint-conditions
>>  set call-timeout some-other-case
>>
>> Just some thoughts, by no way am I objecting to what you have.
>>
>>> +
>>> +  The default for the direct timeout is unlimited, while the default
>>> +  for the indirect timeout is 30 seconds.
>>
>> While working on Windows non-stop support recently, I noticed that
>> gdb.threads/multiple-successive-infcall.exp has infcalls that would
>> just hang "forever", the infcall never completed.  The test
>> enables schedlock, and then calls a function in each thread in the
>> program [like, (gdb) p get_value()].  The issue turns out to be about
>> calling a function in a thread that is currently running Windows kernel
>> code.  On Linux, most system calls are interruptible (EINTR), and
>> restartable.  When the debugger pauses a thread and the thread is in a
>> syscall, the syscall is interrupted and restarted later when the thread
>> is resumed.  On Windows, system calls are NOT interruptible.  The threads
>> in question in the testcase were stopped inside the syscall done by
>> ntdll!ZwWaitForMultipleObjects.  In that scenario, you can still pause the
>> hung thread with Ctrl-C, and you'll see that the (userspace) PC of the thread
>> in question hasn't changed, it is still pointing to the entry to the
>> function GDB wants to call -- not surprising since the thread is really
>> still blocked inside the syscall and never ran any userspace instruction.
>>
>> This looks like something that Windows GDB users are likely to trip on more
>> frequently than GNU/Linux users.
>>
>> So I looked at how Visual Studio (not vscode) handles it, to check how it 
>> handles this, maybe it just doesn't let you call functions on threads that are
>> stopped inside a syscall?  Nope.  You guessed it, it handles it with a timeout.
>> If you add a watch expression (like a gdb "display") involving infcall, and the thread
>> is in kernel code, VS will still try the call, and then after a few short
>> seconds (maybe some 5s), it aborts the expression, popping a dialog box informing
>> you about it.
>>
>> All that to say that I would think it reasonable to default to a
>> shorter timeout in GDB too.
>>
>> Actually, I remembered now that LLDB also has a timeout for infcalls.
>> On the version I have handy installed, "help expression" talks about
>> a timeout of "currently .25 seconds", and then retrying with all threads
>> running, (that's 0.25s, not 25s IIUC, curiously, higher resolution than
>> second), but I don't know how long that second retry has for timeout,
>> if it has one.
>>
>> For breakpoint conditions, I think it may be nice (but not a
>> requirement of this patch, just an idea) if after some time less than
>> the whole timeout time, for GDB to print a warning, something like:
>>
>>  warning: a function call in the condition of breakpoint 2.3 is taking long.
>>
>> Like, we could print that warning after 1 second, even if the timeout
>> is set to higher than that.
>>
>> Anyhow, all that is a lot easier to code than debate and it can always
>> be done later.
>>
>>> +
>>> +  These timeouts will only have an effect for targets that are
>>> +  operating in async mode.  For non-async targets the timeouts are
>>> +  ignored, GDB will wait indefinitely for an inferior function to
>>> +  complete, unless interrupted by the user using Ctrl-C.
>>> +
>>>  * MI changes
>>>  
>>>  ** mi now reports 'no-history' as a stop reason when hitting the end of the
>>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
>>> index fe76e5e0a0e..46f17798510 100644
>>> --- a/gdb/doc/gdb.texinfo
>>> +++ b/gdb/doc/gdb.texinfo
>>> @@ -20885,6 +20885,72 @@
>>>  @code{step}, etc).  In this case, when the inferior finally returns to
>>>  the dummy-frame, @value{GDBN} will once again halt the inferior.
>>>  
>>> +On targets that support asynchronous execution (@pxref{Background
>>> +Execution}) @value{GDBN} can place a timeout on any functions called
>>> +from @value{GDBN}.  If the timeout expires and the function call is
>>> +still ongoing, then @value{GDBN} will interrupt the program.
>>
>> In the patch introducing "set unwind-on-timeout", I think it would be
>> good to mention the setting here.  I didn't notice it being added there.
>> Because, as I read this, I wondered "OK, but what happens after GDB
>> interrupts the program?  Do we unwind according to set unwind-on-signal?" .
>>
>>> +
>>> +For targets that don't support asynchronous execution
>>> +(@pxref{Background Execution}) then timeouts for functions called from
>>> +@value{GDBN} are not supported, the timeout settings described below
>>> +will be treated as @code{unlimited}, meaning @value{GDBN} will wait
>>> +indefinitely for function call to complete, unless interrupted by the
>>> +user using @kbd{Ctrl-C}.
>>> +
>>
>> ...
>>
>>> diff --git a/gdb/infcall.c b/gdb/infcall.c
>>> index 4fb8ab07db0..bb57faf700f 100644
>>> --- a/gdb/infcall.c
>>> +++ b/gdb/infcall.c
>>> @@ -95,6 +95,53 @@ show_may_call_functions_p (struct ui_file *file, int from_tty,
>>>  	      value);
>>>  }
>>>  
>>> +/* A timeout (in seconds) for direct inferior calls.  A direct inferior
>>> +   call is one the user triggers from the prompt, e.g. with a 'call' or
>>> +   'print' command.  Compare with the definition of indirect calls below.  */
>>> +
>>> +static unsigned int direct_call_timeout = UINT_MAX;
>>> +
>>> +/* Implement 'show direct-call-timeout'.  */
>>> +
>>> +static void
>>> +show_direct_call_timeout (struct ui_file *file, int from_tty,
>>> +			  struct cmd_list_element *c, const char *value)
>>> +{
>>> +  if (target_has_execution () && !target_can_async_p ())
>>> +    gdb_printf (file, _("Current target does not support async mode, timeout "
>>> +			"for direct inferior calls is \"unlimited\".\n"));
>>> +  else if (direct_call_timeout == UINT_MAX)
>>> +    gdb_printf (file, _("Timeout for direct inferior function calls "
>>> +			"is \"unlimited\".\n"));
>>> +  else
>>> +    gdb_printf (file, _("Timeout for direct inferior function calls "
>>> +			"is \"%s seconds\".\n"), value);
>>> +}
>>> +
>>> +/* A timeout (in seconds) for indirect inferior calls.  An indirect inferior
>>> +   call is one that originates from within GDB, for example, when
>>> +   evaluating an expression for a conditional breakpoint.  Compare with
>>> +   the definition of direct calls above.  */
>>> +
>>> +static unsigned int indirect_call_timeout = 30;
>>> +
>>> +/* Implement 'show indirect-call-timeout'.  */
>>> +
>>> +static void
>>> +show_indirect_call_timeout (struct ui_file *file, int from_tty,
>>> +			  struct cmd_list_element *c, const char *value)
>>> +{
>>> +  if (target_has_execution () && !target_can_async_p ())
>>> +    gdb_printf (file, _("Current target does not support async mode, timeout "
>>> +			"for indirect inferior calls is \"unlimited\".\n"));
>>> +  else if (indirect_call_timeout == UINT_MAX)
>>> +    gdb_printf (file, _("Timeout for indirect inferior function calls "
>>> +			"is \"unlimited\".\n"));
>>> +  else
>>> +    gdb_printf (file, _("Timeout for indirect inferior function calls "
>>> +			"is \"%s seconds\".\n"), value);
>>> +}
>>> +
>>>  /* How you should pass arguments to a function depends on whether it
>>>     was defined in K&R style or prototype style.  If you define a
>>>     function using the K&R syntax that takes a `float' argument, then
>>> @@ -589,6 +636,86 @@ call_thread_fsm::should_notify_stop ()
>>>    return true;
>>>  }
>>>  
>>> +/* A class to control creation of a timer that will interrupt a thread
>>> +   during an inferior call.  */
>>> +struct infcall_timer_controller
>>> +{
>>> +  /* Setup an event-loop timer that will interrupt PTID if the inferior
>>> +     call takes too long.  DIRECT_CALL_P is true when this inferior call is
>>> +     a result of the user using a 'print' or 'call' command, and false when
>>> +     this inferior call is a result of e.g. a conditional breakpoint
>>> +     expression, this is used to select which timeout to use.  */
>>> +  infcall_timer_controller (thread_info *thr, bool direct_call_p)
>>> +    : m_thread (thr)
>>> +  {
>>> +    unsigned int timeout
>>> +      = direct_call_p ? direct_call_timeout : indirect_call_timeout;
>>> +    if (timeout < UINT_MAX && target_can_async_p ())
>>> +      {
>>> +	int ms = timeout * 1000;
>>> +	int id = create_timer (ms, infcall_timer_controller::timed_out, this);
>>> +	m_timer_id.emplace (id);
>>> +	infcall_debug_printf ("Setting up infcall timeout timer for "
>>> +			      "ptid %s: %d milliseconds",
>>> +			      m_thread->ptid.to_string ().c_str (), ms);
>>> +      }
>>> +  }
>>> +
>>> +  /* Destructor.  Ensure that the timer is removed from the event loop.  */
>>> +  ~infcall_timer_controller ()
>>> +  {
>>> +    /* If the timer has already triggered, then it will have already been
>>> +       deleted from the event loop.  If the timer has not triggered, then
>>> +       delete it now.  */
>>> +    if (m_timer_id.has_value () && !m_triggered)
>>> +      delete_timer (*m_timer_id);
>>> +
>>> +    /* Just for clarity, discard the timer id now.  */
>>> +    m_timer_id.reset ();
>>> +  }
>>> +
>>> +  /* Return true if there was a timer in place, and the timer triggered,
>>> +     otherwise, return false.  */
>>> +  bool triggered_p ()
>>> +  {
>>> +    gdb_assert (!m_triggered || m_timer_id.has_value ());
>>> +    return m_triggered;
>>> +  }
>>> +
>>> +private:
>>> +  /* The thread we should interrupt.  */
>>> +  thread_info *m_thread;
>>> +
>>> +  /* Set true when the timer is triggered.  */
>>> +  bool m_triggered = false;
>>> +
>>> +  /* Given a value when a timer is in place.  */
>>> +  gdb::optional<int> m_timer_id;
>>> +
>>> +  /* Callback for the timer, forwards to ::trigger below.  */
>>> +  static void
>>> +  timed_out (gdb_client_data context)
>>> +  {
>>> +    infcall_timer_controller *ctrl
>>> +      = static_cast<infcall_timer_controller *> (context);
>>> +    ctrl->trigger ();
>>> +  }
>>> +
>>> +  /* Called when the timer goes off.  Stop thread m_thread.  */
>>
>> Uppercase M_THREAD.
>
> Fixed.
>
>>
>>> +  void
>>> +  trigger ()
>>> +  {
>>> +    m_triggered = true;
>>> +
>>> +    scoped_disable_commit_resumed disable_commit_resumed ("infcall timeout");
>>> +
>>> +    infcall_debug_printf ("Stopping thread %s",
>>> +			  m_thread->ptid.to_string ().c_str ());
>>> +    target_stop (m_thread->ptid);
>>> +    m_thread->stop_requested = true;
>>
>> As per the discussion in the remote patch, I think this will need
>> to be adjusted.  Maybe something like:
>>
>>     if (target_is_non_stop_p ())
>>       {
>>         target_stop (m_thread->ptid);
>>         m_thread->stop_requested = true;
>>       }
>>     else
>>       target_interrupt ();
>
> I understand your critique of the 'avoid SIGINT after calling
> remote_target::stop' patch, but I don't understand this comment.  What
> we want to do is stop the target, not interrupt it, thus, surely, we
> should call target_stop.
>
> The fact that we can't target_stop for a !non-stop target is surely
> something the target should deal with.  And indeed, if we check out
> remote_target::stop we see that for !non-stop target we call
> remote_interrupt_as.  In contrast, calling remote_target::interrupt for
> a !non-stop target also calls remote_interrupt_as, which I think is very
> much the point of your critique, right?
>
> My thinking here is that, if we _did_ come up with some clever way to
> support ::stop for a !non-stop target, this code would be added to
> remote_target::stop, but _not_ to remote_target::interrupt, so we should
> call the function that matches our intention, even if, right now, GDB
> can't actually satisfy our needs.
>
>>
>>> +  }
>>> +};
>>> +
>>>  /* Subroutine of call_function_by_hand to simplify it.
>>>     Start up the inferior and wait for it to stop.
>>>     Return the exception if there's an error, or an exception with
>>> @@ -599,13 +726,15 @@ call_thread_fsm::should_notify_stop ()
>>>  
>>>  static struct gdb_exception
>>>  run_inferior_call (std::unique_ptr<call_thread_fsm> sm,
>>> -		   struct thread_info *call_thread, CORE_ADDR real_pc)
>>> +		   struct thread_info *call_thread, CORE_ADDR real_pc,
>>> +		   bool *timed_out_p)
>>>  {
>>>    INFCALL_SCOPED_DEBUG_ENTER_EXIT;
>>>  
>>>    struct gdb_exception caught_error;
>>>    ptid_t call_thread_ptid = call_thread->ptid;
>>>    int was_running = call_thread->state == THREAD_RUNNING;
>>> +  *timed_out_p = false;
>>>  
>>>    infcall_debug_printf ("call function at %s in thread %s, was_running = %d",
>>>  			core_addr_to_string (real_pc),
>>> @@ -617,6 +746,16 @@ run_inferior_call (std::unique_ptr<call_thread_fsm> sm,
>>>    scoped_restore restore_in_infcall
>>>      = make_scoped_restore (&call_thread->control.in_infcall, 1);
>>>  
>>> +  /* If the thread making the inferior call stops with a time out then the
>>> +     stop_requested flag will be set.  However, we don't want changes to
>>> +     this flag to leak back to our caller, we might be here to handle an
>>> +     inferior call from a breakpoint condition, so leaving this flag set
>>> +     would appear that the breakpoint stop was actually a requested stop,
>>> +     which is not true, and will cause GDB to print extra messages to the
>>> +     output.  */
>>> +  scoped_restore restore_stop_requested
>>> +    = make_scoped_restore (&call_thread->stop_requested, false);
>>
>> I'm confused by this.  If stop_requested was set when the breakpoint was hit,
>> are we still evaluating the breakpoint condition (and re-resuming the thread
>> if the condition is false) ?
>
> I don't really understand your question here, but I don't think that it
> matters now.  This stop_requested stuff was only in place to support the
> 'avoid SIGINT after calling remote_target::stop' patch (the next one),
> which I'm going to drop after your feedback.
>
>>
>>> +
>>>    clear_proceed_status (0);
>>>  
>>
>>> --- /dev/null
>>> +++ b/gdb/testsuite/gdb.base/infcall-timeout.c
>>> @@ -0,0 +1,36 @@
>>> +/* Copyright 2022-2023 Free Software Foundation, Inc.
>>> +
>>> +   This file is part of GDB.
>>> +
>>> +   This program is free software; you can redistribute it and/or modify
>>> +   it under the terms of the GNU General Public License as published by
>>> +   the Free Software Foundation; either version 3 of the License, or
>>> +   (at your option) any later version.
>>> +
>>> +   This program is distributed in the hope that it will be useful,
>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +   GNU General Public License for more details.
>>> +
>>> +   You should have received a copy of the GNU General Public License
>>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>>> +
>>> +#include <unistd.h>
>>> +
>>> +/* This function is called from GDB.  */
>>> +int
>>> +function_that_never_returns ()
>>> +{
>>> +  while (1)
>>> +    sleep (1);
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +int
>>> +main ()
>>> +{
>>> +  alarm (300);
>>> +
>>> +  return 0;
>>> +}
>>> diff --git a/gdb/testsuite/gdb.base/infcall-timeout.exp b/gdb/testsuite/gdb.base/infcall-timeout.exp
>>
>>
>> ...
>>
>>> +standard_testfile
>>> +
>>> +if { [build_executable "failed to prepare" ${binfile} "${srcfile}" \
>>> +	  {debug}] == -1 } {
>>> +    return
>>> +}
>>> +
>>> +# Start GDB according to TARGET_ASYNC and TARGET_NON_STOP, then adjust
>>> +# the direct-call-timeout, and make an inferior function call that
>>> +# will never return.  GDB should eventually timeout and stop the
>>> +# inferior.
>>> +proc_with_prefix run_test { target_async target_non_stop } {
>>> +    save_vars { ::GDBFLAGS } {
>>> +	append ::GDBFLAGS \
>>> +	    " -ex \"maint set target-non-stop $target_non_stop\""
>>
>> It's curious that target-non-stop on|off is tested, but not "set non-stop on".
>
> I've extended the tests to cover this case.
>
>>
>>> +	append ::GDBFLAGS \
>>> +	    " -ex \"maintenance set target-async ${target_async}\""
>>> +
>>> +	clean_restart ${::binfile}
>>> +    }
>>> +
>>
>> ...
>>
>>> diff --git a/gdb/testsuite/gdb.threads/infcall-from-bp-cond-timeout.exp b/gdb/testsuite/gdb.threads/infcall-from-bp-cond-timeout.exp
>>> new file mode 100644
>>> index 00000000000..4159288a39c
>>> --- /dev/null
>>> +++ b/gdb/testsuite/gdb.threads/infcall-from-bp-cond-timeout.exp
>>
>> ...
>>
>>> +
>>> +    gdb_breakpoint \
>>> +	"${::srcfile}:${::cond_bp_line} if (condition_func ())"
>>> +    set bp_num [get_integer_valueof "\$bpnum" "*UNKNOWN*" \
>>> +		    "get number for conditional breakpoint"]
>>> +
>>> +    gdb_breakpoint "${::srcfile}:${::final_bp_line}"
>>> +    set final_bp_num [get_integer_valueof "\$bpnum" "*UNKNOWN*" \
>>> +			  "get number for final breakpoint"]
>>> +
>>> +    # The thread performing an inferior call relies on a second
>>> +    # thread.  The second thread will segfault unless it hits a
>>> +    # breakpoint first.  In either case the initial thread will not
>>> +    # complete its inferior call.
>>> +    if { $other_thread_bp } {
>>> +	gdb_breakpoint "${::srcfile}:${::segfault_line}"
>>> +	set segfault_bp_num [get_integer_valueof "\$bpnum" "*UNKNOWN*" \
>>> +				 "get number for segfault breakpoint"]
>>> +    }
>>> +
>>> +    # When non-stop mode is off we get slightly different output from GDB.
>>> +    if { [gdb_is_remote_or_extended_remote_target] && !$target_non_stop} {
>>> +	set stopped_line_pattern "Thread ${::decimal} \"\[^\r\n\"\]+\" received signal SIGINT, Interrupt\\."
>>> +    } else {
>>> +	set stopped_line_pattern "Thread ${::decimal} \"\[^\r\n\"\]+\" stopped\\."
>>> +    }
>>
>> Something is going on in this test that when testing against gdbserver with
>> all-stop, it is always Thread 2 that reports the SIGINT, which is coincidentally
>> the thread that was hitting the breakpoint and running the infcall, AFAICS.
>>
>>  continue
>>  Continuing.
>>  [New Thread 3506594.3506599]
>>  [New Thread 3506594.3506600]
>>  [New Thread 3506594.3506601]
>>  [New Thread 3506594.3506602]
>>  [New Thread 3506594.3506603]
>>
>>  Thread 2 "infcall-from-bp" received signal SIGINT, Interrupt.
>> __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555558080 <thread_1_semaphore>) at ./nptl/futex-internal.c:57
>>
>> Why is that?
>
> The answer lies in the 'gdb: fix b/p conditions with infcalls in
> multi-threaded inferiors' patch, specifically the change to
> user_visible_resume_ptid, which ensures that, when evaluating a B/P
> condition, only the thread evaluating the condition is resumed.
>
> The code didn't originate with me[1], but I didn't question it too much
> when I incorporated it into this series, and maybe I should have.
>
> I wonder if in all-stop mode we should be resuming all threads when
> evaluating the condition?
>
> I'll think about this some more follow up..
>
> [1] https://inbox.sourceware.org/gdb-patches/20201009112719.629-3-natalia.saiapova@intel.com/

OK, so the post I linked includes one reason for only resuming the
thread in which the condition is being evaluated, but the more I think
about it, the less I'm sure resuming all threads would make sense.

If we consider an all-stop target (rather than GDB running all-stop on
top of a non-stop target), if all threads resumed then there's a
chance we could hit some event in a thread other than the one executing
the b/p condition.  If this happens then we're going to be in trouble as
all threads would stop, including the b/p thread, and we'd have to
abandon the conditional breakpoint evaluation, which would result in
more b/p conditions being abandoned than necessary.

For an all-stop on non-stop target, maybe the situation isn't so bad?
The post linked above does highlight one problem, that at the point the
b/p condition is evaluated other threads that are stopping might not be
in a state suitable for being resumed, and if two such threads both hit
a conditional breakpoint at the same time I'm really not sure what we'd
do ... resume some threads maybe?

Like I said, the more I consider it, the more it feels like just
resuming the one thread solves a bunch of problems.

What might help, I guess, would be to wrap the condition evaluation into
a thread_fsm and handle the condition evaluation asynchronously, but
that feels like a whole mountain of change -- I'd much rather merge
something like this series and then build from there.

Anyway, hopefully this explains why we're seeing the stop arrive in
thread 2, rather than the main thread.

Would be interested to hear your thoughts,

Thanks,
Andrew