public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gdb: fix "set scheduler-locking" thread exit hang
@ 2021-11-08 21:49 Simon Marchi
  2021-11-11 18:09 ` Pedro Alves
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Marchi @ 2021-11-08 21:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi

GDB hangs when doing this:

 - launch inferior with multiple threads
 - multiple threads hit some breakpoint(s)
 - one breakpoint hit is presented as a stop, the rest are saved as
   pending wait statuses
 - "set scheduler-locking on"
 - resume the currently selected thread (because of scheduler-locking,
   it's the only one resumed), let it execute until exit
 - GDB hangs, not showing the prompt, impossible to interrupt with ^C

When the resumed thread exits, we expect the target to return a
TARGET_WAITKIND_NO_RESUMED event, and that's what we see:

    [infrun] fetch_inferior_event: enter
      [infrun] scoped_disable_commit_resumed: reason=handling event
      [infrun] random_pending_event_thread: None found.
    [Thread 0x7ffff7d9c700 (LWP 309357) exited]
      [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
      [infrun] print_target_wait_results:   -1.0.0 [process -1],
      [infrun] print_target_wait_results:   status->kind = no-resumed
      [infrun] handle_inferior_event: status->kind = no-resumed
      [infrun] handle_no_resumed: TARGET_WAITKIND_NO_RESUMED (ignoring: found resumed)
      [infrun] prepare_to_wait: prepare_to_wait
      [infrun] reset: reason=handling event
      [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
    [infrun] fetch_inferior_event: exit

The problem is in handle_no_resumed: we check if some other thread is
actually resumed, to see if we should ignore that event (see comments in
that function for more info).  If this condition is true:

    (thread->executing () || thread->has_pending_waitstatus ())

... then we ignore the event.  The problem is that there are some non-resumed
threads with a pending event, which makes us ignore the event.  But these
threads are not resumed, so we end up waiting while nothing executes, hence
waiting for ever.

My first fix was to change the condition to:

    (thread->executing ()
     || (thread->resumed () && thread->has_pending_waitstatus ()))

... but then it occured to me that we could simply check for:

    (thread->resumed ())

Since "executing" implies "resumed", checking simply for "resumed"
covers threads that are resumed and executing, as well as threads that
are resumed with a pending status, which is what we want.

Change-Id: Ie796290f8ae7f34c026ca3a8fcef7397414f4780
---
 gdb/infrun.c                                  |  3 +-
 .../gdb.threads/schedlock-thread-exit.c       | 46 +++++++++++++++++++
 .../gdb.threads/schedlock-thread-exit.exp     | 42 +++++++++++++++++
 3 files changed, 89 insertions(+), 2 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/schedlock-thread-exit.c
 create mode 100644 gdb/testsuite/gdb.threads/schedlock-thread-exit.exp

diff --git a/gdb/infrun.c b/gdb/infrun.c
index c6341272bbd..95c8bfb2e7d 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -5138,8 +5138,7 @@ handle_no_resumed (struct execution_control_state *ecs)
 	  swap_terminal = false;
 	}
 
-      if (!ignore_event
-	  && (thread->executing () || thread->has_pending_waitstatus ()))
+      if (!ignore_event && thread->resumed ())
 	{
 	  /* Either there were no unwaited-for children left in the
 	     target at some point, but there are now, or some target
diff --git a/gdb/testsuite/gdb.threads/schedlock-thread-exit.c b/gdb/testsuite/gdb.threads/schedlock-thread-exit.c
new file mode 100644
index 00000000000..ea8a349ed13
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/schedlock-thread-exit.c
@@ -0,0 +1,46 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <pthread.h>
+
+static void *
+thread_func (void *p)
+{
+  return NULL;
+}
+
+int
+main (void)
+{
+  const int nthreads = 10;
+  pthread_t threads[nthreads];
+
+  for (int i = 0; i < nthreads; ++i)
+    {
+      int ret = pthread_create (&threads[i], NULL, thread_func, NULL);
+      assert (ret == 0);
+    }
+
+  for (int i = 0; i < nthreads; ++i)
+    {
+      int ret = pthread_join (threads[i], NULL);
+      assert (ret == 0);
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/schedlock-thread-exit.exp b/gdb/testsuite/gdb.threads/schedlock-thread-exit.exp
new file mode 100644
index 00000000000..818baa781db
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/schedlock-thread-exit.exp
@@ -0,0 +1,42 @@
+# Copyright 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that GDB handles well a thread exiting with sched-lock on.  A buggy GDB
+# would hang, a fixed GDB shows the "No unwaited-for children left" message and
+# shows the prompt.
+#
+# The explanation for the original bug involved some threads with a pending
+# wait status, so we launch multiple threads, which very likely gives us that.
+
+standard_testfile
+
+if { [build_executable "failed to prepare" ${testfile} ${srcfile} {debug pthreads}] } {
+    return
+}
+
+proc do_test { } {
+    clean_restart $::binfile
+
+    # One of the launched threads will report a stop on thread_func.  Some
+    # others will also stop on thread_func and have a pending status.
+    if { ![runto "thread_func"] } {
+	return
+    }
+
+    gdb_test_no_output "set scheduler-locking on"
+    gdb_test "continue" "No unwaited-for children left."
+}
+
+do_test
-- 
2.33.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gdb: fix "set scheduler-locking" thread exit hang
  2021-11-08 21:49 [PATCH] gdb: fix "set scheduler-locking" thread exit hang Simon Marchi
@ 2021-11-11 18:09 ` Pedro Alves
  2021-11-11 18:33   ` Simon Marchi
  0 siblings, 1 reply; 5+ messages in thread
From: Pedro Alves @ 2021-11-11 18:09 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2021-11-08 21:49, Simon Marchi via Gdb-patches wrote:

> The problem is in handle_no_resumed: we check if some other thread is
> actually resumed, to see if we should ignore that event (see comments in
> that function for more info).  If this condition is true:
> 
>     (thread->executing () || thread->has_pending_waitstatus ())
> 
> ... then we ignore the event.  The problem is that there are some non-resumed
> threads with a pending event, which makes us ignore the event.  But these
> threads are not resumed, so we end up waiting while nothing executes, hence
> waiting for ever.

"There are some non-resumed" ... "But these are not resumed".

The first "non-resumed" should be "non-executing", I believe.

> 
> My first fix was to change the condition to:
> 
>     (thread->executing ()
>      || (thread->resumed () && thread->has_pending_waitstatus ()))
> 
> ... but then it occured to me that we could simply check for:
> 
>     (thread->resumed ())
> 
> Since "executing" implies "resumed", checking simply for "resumed"
> covers threads that are resumed and executing, as well as threads that
> are resumed with a pending status, which is what we want.

OK.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gdb: fix "set scheduler-locking" thread exit hang
  2021-11-11 18:09 ` Pedro Alves
@ 2021-11-11 18:33   ` Simon Marchi
  2021-11-11 19:04     ` Pedro Alves
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Marchi @ 2021-11-11 18:33 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-11-11 1:09 p.m., Pedro Alves wrote:
> On 2021-11-08 21:49, Simon Marchi via Gdb-patches wrote:
>
>> The problem is in handle_no_resumed: we check if some other thread is
>> actually resumed, to see if we should ignore that event (see comments in
>> that function for more info).  If this condition is true:
>>
>>     (thread->executing () || thread->has_pending_waitstatus ())
>>
>> ... then we ignore the event.  The problem is that there are some non-resumed
>> threads with a pending event, which makes us ignore the event.  But these
>> threads are not resumed, so we end up waiting while nothing executes, hence
>> waiting for ever.
>
> "There are some non-resumed" ... "But these are not resumed".
>
> The first "non-resumed" should be "non-executing", I believe.

Hmm, no.  These other threads ("other" meaning other than the thread
resumed under schedlock) are stopped / non-infrun-resumed.

Simon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gdb: fix "set scheduler-locking" thread exit hang
  2021-11-11 18:33   ` Simon Marchi
@ 2021-11-11 19:04     ` Pedro Alves
  2021-11-11 19:14       ` Simon Marchi
  0 siblings, 1 reply; 5+ messages in thread
From: Pedro Alves @ 2021-11-11 19:04 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2021-11-11 18:33, Simon Marchi wrote:
> On 2021-11-11 1:09 p.m., Pedro Alves wrote:
>> On 2021-11-08 21:49, Simon Marchi via Gdb-patches wrote:
>>
>>> The problem is in handle_no_resumed: we check if some other thread is
>>> actually resumed, to see if we should ignore that event (see comments in
>>> that function for more info).  If this condition is true:
>>>
>>>     (thread->executing () || thread->has_pending_waitstatus ())
>>>
>>> ... then we ignore the event.  The problem is that there are some non-resumed
>>> threads with a pending event, which makes us ignore the event.  But these
>>> threads are not resumed, so we end up waiting while nothing executes, hence
>>> waiting for ever.
>>
>> "There are some non-resumed" ... "But these are not resumed".
>>
>> The first "non-resumed" should be "non-executing", I believe.
> 
> Hmm, no.  These other threads ("other" meaning other than the thread
> resumed under schedlock) are stopped / non-infrun-resumed.

I see, nevermind, I somehow read "non-resumed" vs "not resumed" as opposites
and then brain shorted out.  Sorry.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] gdb: fix "set scheduler-locking" thread exit hang
  2021-11-11 19:04     ` Pedro Alves
@ 2021-11-11 19:14       ` Simon Marchi
  0 siblings, 0 replies; 5+ messages in thread
From: Simon Marchi @ 2021-11-11 19:14 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 2021-11-11 2:04 p.m., Pedro Alves wrote:
> On 2021-11-11 18:33, Simon Marchi wrote:
>> On 2021-11-11 1:09 p.m., Pedro Alves wrote:
>>> On 2021-11-08 21:49, Simon Marchi via Gdb-patches wrote:
>>>
>>>> The problem is in handle_no_resumed: we check if some other thread is
>>>> actually resumed, to see if we should ignore that event (see comments in
>>>> that function for more info).  If this condition is true:
>>>>
>>>>     (thread->executing () || thread->has_pending_waitstatus ())
>>>>
>>>> ... then we ignore the event.  The problem is that there are some non-resumed
>>>> threads with a pending event, which makes us ignore the event.  But these
>>>> threads are not resumed, so we end up waiting while nothing executes, hence
>>>> waiting for ever.
>>>
>>> "There are some non-resumed" ... "But these are not resumed".
>>>
>>> The first "non-resumed" should be "non-executing", I believe.
>>
>> Hmm, no.  These other threads ("other" meaning other than the thread
>> resumed under schedlock) are stopped / non-infrun-resumed.
> 
> I see, nevermind, I somehow read "non-resumed" vs "not resumed" as opposites
> and then brain shorted out.  Sorry.
> 


Np, thanks for the review.  I will push the patch.

Simon

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-11 19:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-08 21:49 [PATCH] gdb: fix "set scheduler-locking" thread exit hang Simon Marchi
2021-11-11 18:09 ` Pedro Alves
2021-11-11 18:33   ` Simon Marchi
2021-11-11 19:04     ` Pedro Alves
2021-11-11 19:14       ` Simon Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).