public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH][gdb] Fix hang after ext sigkill
@ 2020-02-24 20:14 Tom de Vries
  2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
  2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
  0 siblings, 2 replies; 12+ messages in thread
From: Tom de Vries @ 2020-02-24 20:14 UTC (permalink / raw)
  To: gdb-patches; +Cc: Pedro Alves

Hi,

Consider the test-case from this patch, compiled with pthread support:
...
$ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
...

After running, the program sleeps:
...
$ gdb a.out
Reading symbols from a.out...
(gdb) r
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fe700 (LWP 22604)]
...

Until we interrupt it with a control-C:
...
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
(gdb)
...

If we then kill the inferior using an external SIGKILL:
...
(gdb) shell killall -s SIGKILL a.out
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs which repeating the same warning.  Typing control-C no longer helps,
and we have to kill gdb using kill.

This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution".  The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.

Fix this by catching the exception in the constructor.

Build and reg-tested on x86_64-linux.

OK for trunk?

Thanks,
- Tom

[gdb] Fix hang after ext sigkill

gdb/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* thread.c
	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
	exception in get_frame_id.
	(scoped_restore_current_thread::~scoped_restore_current_thread):
	Handle lack of inferior.

gdb/testsuite/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* gdb.threads/hang-after-ext-sigkill.c: New test.
	* gdb.threads/hang-after-ext-sigkill.exp: New file.
	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.

---
 gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++++
 .../gdb.threads/hang-after-ext-sigkill.exp         | 73 ++++++++++++++++++++++
 gdb/testsuite/lib/gdb.exp                          |  2 +-
 gdb/thread.c                                       | 14 ++++-
 4 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
new file mode 100644
index 0000000000..bfce6c3085
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
@@ -0,0 +1,40 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+static void *
+fun (void *dummy)
+{
+  while (1)
+    sleep (1);
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  pthread_t thread;
+  pthread_create (&thread, NULL, fun, NULL);
+
+  while (1)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
new file mode 100644
index 0000000000..3702eb8c13
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
@@ -0,0 +1,73 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 {pthreads}] == -1} {
+    return -1
+}
+
+set res [runto main no-message temporary]
+if { $res != 1 } {
+    return -1
+}
+
+set pid -1
+gdb_test_multiple "info inferior 1" "get inferior pid" {
+    -re -wrap "process (\[0-9\]*).*" {
+       set pid $expect_out(1,string)
+       pass $gdb_test_name
+    }
+}
+if { $pid == -1 } {
+    return -1
+}
+
+gdb_test_multiple "continue" "" {
+    -re "Continuing" {
+	pass $gdb_test_name
+    }
+}
+
+send_gdb "\003"
+
+gdb_test_multiple "" "get sigint" {
+    -re -wrap "Thread \[0-9\]+ .* received signal SIGINT, Interrupt\..*" {
+       pass $gdb_test_name
+   }
+}
+
+gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
+
+# Expect the signal to arrive, and the program to terminate.
+# Ideally at the first continue, currently though at the second.
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+set msg [join [list $killed_msg $no_longer_exists_msg $not_being_run_msg] \
+	     ".*"]
+
+# Don't expect '$gdb_prompt $', there still may be output after the prompt,
+# f.i.:
+# ...
+# (gdb) [Thread 0x7ffff74c6700 (LWP 19277) exited]^M
+# ...
+gdb_test_multiple "continue" "" {
+    -re ".*\r\n$gdb_prompt " {
+	gdb_test "continue" $msg $gdb_test_name
+    }
+}
+
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 81518b9646..745694df2d 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -571,7 +571,7 @@ proc runto { function args } {
 	    }
 	    return 1
 	}
-	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
+	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
 	    if { $print_pass } {
 		pass $test_name
 	    }
diff --git a/gdb/thread.c b/gdb/thread.c
index 54b59e2244..9876ca3c76 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
 
 scoped_restore_current_thread::~scoped_restore_current_thread ()
 {
+  if (m_inf == NULL)
+    return;
   if (!m_dont_restore)
     {
       try
@@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
       else
 	frame = NULL;
 
-      m_selected_frame_id = get_frame_id (frame);
+      try
+       {
+	 m_selected_frame_id = get_frame_id (frame);
+       }
+      catch (const gdb_exception &ex)
+       {
+	 m_inf = NULL;
+	 m_selected_frame_id = null_frame_id;
+	 m_selected_frame_level = -1;
+	 return;
+       }
       m_selected_frame_level = frame_relative_level (frame);
 
       tp->incref ();

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PING][PATCH][gdb] Fix hang after ext sigkill
  2020-02-24 20:14 [PATCH][gdb] Fix hang after ext sigkill Tom de Vries
@ 2020-03-09 12:52 ` Tom de Vries
  2020-03-23 19:16   ` [PING^2][PATCH][gdb] " Tom de Vries
  2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
  1 sibling, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2020-03-09 12:52 UTC (permalink / raw)
  To: gdb-patches; +Cc: Pedro Alves

On 24-02-2020 21:14, Tom de Vries wrote:
> Hi,
> 
> Consider the test-case from this patch, compiled with pthread support:
> ...
> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
> ...
> 
> After running, the program sleeps:
> ...
> $ gdb a.out
> Reading symbols from a.out...
> (gdb) r
> Starting program: /data/gdb_versions/devel/a.out
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff77fe700 (LWP 22604)]
> ...
> 
> Until we interrupt it with a control-C:
> ...
> ^C
> Thread 1 "a.out" received signal SIGINT, Interrupt.
> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
> (gdb)
> ...
> 
> If we then kill the inferior using an external SIGKILL:
> ...
> (gdb) shell killall -s SIGKILL a.out
> ...
> and subsequently continue:
> ...
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> <repeat>
> ...
> gdb hangs which repeating the same warning.  Typing control-C no longer helps,
> and we have to kill gdb using kill.
> 
> This is a regression since commit 873657b9e8 "Preserve selected thread in
> all-stop w/ background execution".  The commit adds a
> scoped_restore_current_thread typed variable restore_thread to
> fetch_inferior_event, and the hang is caused by the constructor throwing an
> exception.
> 
> Fix this by catching the exception in the constructor.
> 
> Build and reg-tested on x86_64-linux.
> 
> OK for trunk?
> 

Ping.

Thanks,
- Tom

> [gdb] Fix hang after ext sigkill
> 
> gdb/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* thread.c
> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
> 	exception in get_frame_id.
> 	(scoped_restore_current_thread::~scoped_restore_current_thread):
> 	Handle lack of inferior.
> 
> gdb/testsuite/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.
> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
> 
> ---
>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++++
>  .../gdb.threads/hang-after-ext-sigkill.exp         | 73 ++++++++++++++++++++++
>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>  gdb/thread.c                                       | 14 ++++-
>  4 files changed, 127 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
> new file mode 100644
> index 0000000000..bfce6c3085
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
> @@ -0,0 +1,40 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2020 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <pthread.h>
> +#include <unistd.h>
> +
> +static void *
> +fun (void *dummy)
> +{
> +  while (1)
> +    sleep (1);
> +
> +  return NULL;
> +}
> +
> +int
> +main (void)
> +{
> +  pthread_t thread;
> +  pthread_create (&thread, NULL, fun, NULL);
> +
> +  while (1)
> +    sleep (1);
> +
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
> new file mode 100644
> index 0000000000..3702eb8c13
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
> @@ -0,0 +1,73 @@
> +# Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +standard_testfile
> +
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
> +	 {pthreads}] == -1} {
> +    return -1
> +}
> +
> +set res [runto main no-message temporary]
> +if { $res != 1 } {
> +    return -1
> +}
> +
> +set pid -1
> +gdb_test_multiple "info inferior 1" "get inferior pid" {
> +    -re -wrap "process (\[0-9\]*).*" {
> +       set pid $expect_out(1,string)
> +       pass $gdb_test_name
> +    }
> +}
> +if { $pid == -1 } {
> +    return -1
> +}
> +
> +gdb_test_multiple "continue" "" {
> +    -re "Continuing" {
> +	pass $gdb_test_name
> +    }
> +}
> +
> +send_gdb "\003"
> +
> +gdb_test_multiple "" "get sigint" {
> +    -re -wrap "Thread \[0-9\]+ .* received signal SIGINT, Interrupt\..*" {
> +       pass $gdb_test_name
> +   }
> +}
> +
> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
> +
> +# Expect the signal to arrive, and the program to terminate.
> +# Ideally at the first continue, currently though at the second.
> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
> +set no_longer_exists_msg "The program no longer exists\."
> +set not_being_run_msg "The program is not being run\."
> +set msg [join [list $killed_msg $no_longer_exists_msg $not_being_run_msg] \
> +	     ".*"]
> +
> +# Don't expect '$gdb_prompt $', there still may be output after the prompt,
> +# f.i.:
> +# ...
> +# (gdb) [Thread 0x7ffff74c6700 (LWP 19277) exited]^M
> +# ...
> +gdb_test_multiple "continue" "" {
> +    -re ".*\r\n$gdb_prompt " {
> +	gdb_test "continue" $msg $gdb_test_name
> +    }
> +}
> +
> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> index 81518b9646..745694df2d 100644
> --- a/gdb/testsuite/lib/gdb.exp
> +++ b/gdb/testsuite/lib/gdb.exp
> @@ -571,7 +571,7 @@ proc runto { function args } {
>  	    }
>  	    return 1
>  	}
> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>  	    if { $print_pass } {
>  		pass $test_name
>  	    }
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 54b59e2244..9876ca3c76 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
>  
>  scoped_restore_current_thread::~scoped_restore_current_thread ()
>  {
> +  if (m_inf == NULL)
> +    return;
>    if (!m_dont_restore)
>      {
>        try
> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>        else
>  	frame = NULL;
>  
> -      m_selected_frame_id = get_frame_id (frame);
> +      try
> +       {
> +	 m_selected_frame_id = get_frame_id (frame);
> +       }
> +      catch (const gdb_exception &ex)
> +       {
> +	 m_inf = NULL;
> +	 m_selected_frame_id = null_frame_id;
> +	 m_selected_frame_level = -1;
> +	 return;
> +       }
>        m_selected_frame_level = frame_relative_level (frame);
>  
>        tp->incref ();
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PING^2][PATCH][gdb] Fix hang after ext sigkill
  2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
@ 2020-03-23 19:16   ` Tom de Vries
  0 siblings, 0 replies; 12+ messages in thread
From: Tom de Vries @ 2020-03-23 19:16 UTC (permalink / raw)
  To: gdb-patches; +Cc: Pedro Alves

On 09-03-2020 13:52, Tom de Vries wrote:
> On 24-02-2020 21:14, Tom de Vries wrote:
>> Hi,
>>
>> Consider the test-case from this patch, compiled with pthread support:
>> ...
>> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
>> ...
>>
>> After running, the program sleeps:
>> ...
>> $ gdb a.out
>> Reading symbols from a.out...
>> (gdb) r
>> Starting program: /data/gdb_versions/devel/a.out
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New Thread 0x7ffff77fe700 (LWP 22604)]
>> ...
>>
>> Until we interrupt it with a control-C:
>> ...
>> ^C
>> Thread 1 "a.out" received signal SIGINT, Interrupt.
>> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
>> (gdb)
>> ...
>>
>> If we then kill the inferior using an external SIGKILL:
>> ...
>> (gdb) shell killall -s SIGKILL a.out
>> ...
>> and subsequently continue:
>> ...
>> (gdb) c
>> Continuing.
>> Couldn't get registers: No such process.
>> Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> <repeat>
>> ...
>> gdb hangs which repeating the same warning.  Typing control-C no longer helps,
>> and we have to kill gdb using kill.
>>
>> This is a regression since commit 873657b9e8 "Preserve selected thread in
>> all-stop w/ background execution".  The commit adds a
>> scoped_restore_current_thread typed variable restore_thread to
>> fetch_inferior_event, and the hang is caused by the constructor throwing an
>> exception.
>>
>> Fix this by catching the exception in the constructor.
>>
>> Build and reg-tested on x86_64-linux.
>>
>> OK for trunk?
>>
> 

Ping^2.

Thanks,
- Tom
> 
>> [gdb] Fix hang after ext sigkill
>>
>> gdb/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* thread.c
>> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
>> 	exception in get_frame_id.
>> 	(scoped_restore_current_thread::~scoped_restore_current_thread):
>> 	Handle lack of inferior.
>>
>> gdb/testsuite/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
>> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.
>> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
>>
>> ---
>>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++++
>>  .../gdb.threads/hang-after-ext-sigkill.exp         | 73 ++++++++++++++++++++++
>>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>>  gdb/thread.c                                       | 14 ++++-
>>  4 files changed, 127 insertions(+), 2 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> new file mode 100644
>> index 0000000000..bfce6c3085
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> @@ -0,0 +1,40 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> +   Copyright 2020 Free Software Foundation, Inc.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <pthread.h>
>> +#include <unistd.h>
>> +
>> +static void *
>> +fun (void *dummy)
>> +{
>> +  while (1)
>> +    sleep (1);
>> +
>> +  return NULL;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  pthread_t thread;
>> +  pthread_create (&thread, NULL, fun, NULL);
>> +
>> +  while (1)
>> +    sleep (1);
>> +
>> +  return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> new file mode 100644
>> index 0000000000..3702eb8c13
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> @@ -0,0 +1,73 @@
>> +# Copyright (C) 2020 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +standard_testfile
>> +
>> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
>> +	 {pthreads}] == -1} {
>> +    return -1
>> +}
>> +
>> +set res [runto main no-message temporary]
>> +if { $res != 1 } {
>> +    return -1
>> +}
>> +
>> +set pid -1
>> +gdb_test_multiple "info inferior 1" "get inferior pid" {
>> +    -re -wrap "process (\[0-9\]*).*" {
>> +       set pid $expect_out(1,string)
>> +       pass $gdb_test_name
>> +    }
>> +}
>> +if { $pid == -1 } {
>> +    return -1
>> +}
>> +
>> +gdb_test_multiple "continue" "" {
>> +    -re "Continuing" {
>> +	pass $gdb_test_name
>> +    }
>> +}
>> +
>> +send_gdb "\003"
>> +
>> +gdb_test_multiple "" "get sigint" {
>> +    -re -wrap "Thread \[0-9\]+ .* received signal SIGINT, Interrupt\..*" {
>> +       pass $gdb_test_name
>> +   }
>> +}
>> +
>> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
>> +
>> +# Expect the signal to arrive, and the program to terminate.
>> +# Ideally at the first continue, currently though at the second.
>> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
>> +set no_longer_exists_msg "The program no longer exists\."
>> +set not_being_run_msg "The program is not being run\."
>> +set msg [join [list $killed_msg $no_longer_exists_msg $not_being_run_msg] \
>> +	     ".*"]
>> +
>> +# Don't expect '$gdb_prompt $', there still may be output after the prompt,
>> +# f.i.:
>> +# ...
>> +# (gdb) [Thread 0x7ffff74c6700 (LWP 19277) exited]^M
>> +# ...
>> +gdb_test_multiple "continue" "" {
>> +    -re ".*\r\n$gdb_prompt " {
>> +	gdb_test "continue" $msg $gdb_test_name
>> +    }
>> +}
>> +
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index 81518b9646..745694df2d 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -571,7 +571,7 @@ proc runto { function args } {
>>  	    }
>>  	    return 1
>>  	}
>> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
>> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>>  	    if { $print_pass } {
>>  		pass $test_name
>>  	    }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index 54b59e2244..9876ca3c76 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
>>  
>>  scoped_restore_current_thread::~scoped_restore_current_thread ()
>>  {
>> +  if (m_inf == NULL)
>> +    return;
>>    if (!m_dont_restore)
>>      {
>>        try
>> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>>        else
>>  	frame = NULL;
>>  
>> -      m_selected_frame_id = get_frame_id (frame);
>> +      try
>> +       {
>> +	 m_selected_frame_id = get_frame_id (frame);
>> +       }
>> +      catch (const gdb_exception &ex)
>> +       {
>> +	 m_inf = NULL;
>> +	 m_selected_frame_id = null_frame_id;
>> +	 m_selected_frame_level = -1;
>> +	 return;
>> +       }
>>        m_selected_frame_level = frame_relative_level (frame);
>>  
>>        tp->incref ();
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-02-24 20:14 [PATCH][gdb] Fix hang after ext sigkill Tom de Vries
  2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
@ 2020-03-24 15:35 ` Simon Marchi
  2020-03-25 10:29   ` Tom de Vries
  1 sibling, 1 reply; 12+ messages in thread
From: Simon Marchi @ 2020-03-24 15:35 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches; +Cc: Pedro Alves

[-- Attachment #1: Type: text/plain, Size: 3811 bytes --]

Hi Tom,

The test fails for me more often than not.  I've attached a gdb.log showing one such
failure.  There seems to be a problem matching the output of the last "continue".

This is what I see when I reproduce the case by hand:

(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) [Thread 0x7ffff7d99700 (LWP 514079) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.

I didn't really understand why we saw a prompt coming back before the other messages,
so I looked into it a bit and this is what I think happens:

1. While we are stopped at the prompt, the linux-nat event handler is unregistered from the event loop
2. Inferior gets SIGKILL'ed, so GDB gets SIGCHLD'ed, that posts an event to the event pipe, but since it's
   not registered in the event loop, nothing happens
3. User does continue, the linux-nat event handler gets registered with the event loop.  Then, the continue
   fails (No such process), which brings us back at the prompt.  However, the linux-nat event handler stays
   registered with the event loop.
4. When we come back to the event loop, we process the event for the SIGKILL, which makes GDB print the
   thread exit message and "Program terminated" message.

Normally, after the "continue" fails, I don't think we would want to leave the linux-nat
handler registered with the event loop: if it was not registered before, why should it be
registered after?  However, if it wasn't left there, we wouldn't see the messages saying
the program has terminated, so that wouldn't be good either.

Maybe there's a better way to handle it?

However, I still think it would be a good idea to merge a patch like yours, it's already a
step forward (especially since it fixes a regression).

> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> index 81518b9646..745694df2d 100644
> --- a/gdb/testsuite/lib/gdb.exp
> +++ b/gdb/testsuite/lib/gdb.exp
> @@ -571,7 +571,7 @@ proc runto { function args } {
>  	    }
>  	    return 1
>  	}
> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>  	    if { $print_pass } {
>  		pass $test_name
>  	    }
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 54b59e2244..9876ca3c76 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
>  
>  scoped_restore_current_thread::~scoped_restore_current_thread ()
>  {
> +  if (m_inf == NULL)
> +    return;
>    if (!m_dont_restore)
>      {
>        try
> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>        else
>  	frame = NULL;
>  
> -      m_selected_frame_id = get_frame_id (frame);
> +      try
> +       {
> +	 m_selected_frame_id = get_frame_id (frame);
> +       }
> +      catch (const gdb_exception &ex)
> +       {
> +	 m_inf = NULL;
> +	 m_selected_frame_id = null_frame_id;
> +	 m_selected_frame_level = -1;
> +	 return;
> +       }

The indentation is a bit off here.

Instead of clearing everything, I think we should just set m_selected_frame_id to
null_frame_id and m_selected_frame_level to -1.  m_inf and m_thread can still be
set.  This way, the right inferior will be restored, at least, I think this is
desirable.  scoped_restore_current_thread::restore handles well the case where the
thread to restore has exited.  The thread_info object is refcounted for this exact
use case, where the thread would get deleted while

In other words:

      try
	{
	  m_selected_frame_id = get_frame_id (frame);
	  m_selected_frame_level = frame_relative_level (frame);
	}
      catch (const gdb_exception &ex)
	{
	  m_selected_frame_id = null_frame_id;
	  m_selected_frame_level = -1;
	}

Simon

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gdb.log --]
[-- Type: text/x-log; name="gdb.log", Size: 7571 bytes --]

Test run by simark on Tue Mar 24 10:46:56 2020
Native configuration is x86_64-pc-linux-gnu

		=== gdb tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/simark/src/binutils-gdb/gdb/testsuite/config/unix.exp as tool-and-target-specific interface file.
Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp ...
get_compiler_info: gcc-9-3-0
Executing on host: gcc  -fno-stack-protector  -fdiagnostics-color=never -c  -o /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill0.o /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c    (timeout = 300)
spawn -ignore SIGHUP gcc -fno-stack-protector -fdiagnostics-color=never -c -o /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill0.o /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
Executing on host: gcc  -fno-stack-protector /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill0.o  -fdiagnostics-color=never  -lpthreads -lm   -o /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill    (timeout = 300)
spawn -ignore SIGHUP gcc -fno-stack-protector /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill0.o -fdiagnostics-color=never -lpthreads -lm -o /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
compiler exited with status 1
output is:
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status

Executing on host: gcc  -fno-stack-protector /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill0.o  -fdiagnostics-color=never  -lpthread -lm   -o /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill    (timeout = 300)
spawn -ignore SIGHUP gcc -fno-stack-protector /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill0.o -fdiagnostics-color=never -lpthread -lm -o /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill
PASS: gdb.threads/hang-after-ext-sigkill.exp: successfully compiled posix threads test case
spawn /home/simark/build/binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/simark/build/binutils-gdb/gdb/testsuite/../data-directory
warning: Found custom handler for signal 7 (Bus error) preinstalled.
warning: Found custom handler for signal 8 (Floating point exception) preinstalled.
warning: Found custom handler for signal 11 (Segmentation fault) preinstalled.
Some signal dispositions inherited from the environment (SIG_DFL/SIG_IGN)
won't be propagated to spawned programs.
GNU gdb (GDB) 10.0.50.20200324-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) set height 0
(gdb) set width 0
(gdb) dir
Reinitialize source path to empty? (y or n) y
Source directories searched: $cdir:$cwd
(gdb) dir /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads
Source directories searched: /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads:$cdir:$cwd
(gdb) kill
The program is not being run.
(gdb) file /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill
Reading symbols from /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill...
(No debugging symbols found in /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill)
(gdb) delete breakpoints
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) tbreak main
Temporary breakpoint 1 at 0x1165
(gdb) run 
Starting program: /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".

Temporary breakpoint 1, 0x0000555555555165 in main ()
(gdb) info inferior 1
  Num  Description       Connection           Executable        
* 1    process 511301    1 (native)           /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/hang-after-ext-sigkill/hang-after-ext-sigkill 
(gdb) PASS: gdb.threads/hang-after-ext-sigkill.exp: get inferior pid
continue
Continuing.
PASS: gdb.threads/hang-after-ext-sigkill.exp: continue
[New Thread 0x7ffff7c53700 (LWP 511305)]
^C
Thread 2 "hang-after-ext-" received signal SIGINT, Interrupt.
[Switching to Thread 0x7ffff7c53700 (LWP 511305)]
0x00007ffff7d563c5 in clone () from /usr/lib/libc.so.6
(gdb) PASS: gdb.threads/hang-after-ext-sigkill.exp: get sigint
shell kill -s SIGKILL 511301
(gdb) PASS: gdb.threads/hang-after-ext-sigkill.exp: shell kill -s SIGKILL pid
continue
Continuing.
Couldn't get registers: No such process.
(gdb) continue
Continuing.
Couldn't get registers: No such process.
(gdb) FAIL: gdb.threads/hang-after-ext-sigkill.exp: continue
testcase /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp completed in 1 seconds

		=== gdb Summary ===

# of expected passes		5
# of unexpected failures	1
Executing on host: /home/simark/build/binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/simark/build/binutils-gdb/gdb/testsuite/../data-directory --version    (timeout = 300)
spawn -ignore SIGHUP /home/simark/build/binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/simark/build/binutils-gdb/gdb/testsuite/../data-directory --version
warning: Found custom handler for signal 7 (Bus error) preinstalled.
warning: Found custom handler for signal 8 (Floating point exception) preinstalled.
warning: Found custom handler for signal 11 (Segmentation fault) preinstalled.
Some signal dispositions inherited from the environment (SIG_DFL/SIG_IGN)
won't be propagated to spawned programs.
GNU gdb (GDB) 10.0.50.20200324-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
/home/simark/build/binutils-gdb/gdb/gdb version  11 -nw -nx -data-directory /home/simark/build/binutils-gdb/gdb/testsuite/../data-directory 

runtest completed at Tue Mar 24 10:46:58 2020

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
@ 2020-03-25 10:29   ` Tom de Vries
  2020-03-25 14:44     ` Simon Marchi
  0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2020-03-25 10:29 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Pedro Alves

[-- Attachment #1: Type: text/plain, Size: 4477 bytes --]

On 24-03-2020 16:35, Simon Marchi wrote:
> Hi Tom,
> 
> The test fails for me more often than not.  I've attached a gdb.log showing one such
> failure.  There seems to be a problem matching the output of the last "continue".
> 

Hi Simon,

I've managed to reproduce that, by running the test-case in parallel
with stress -c 5.

> This is what I see when I reproduce the case by hand:
> 
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) [Thread 0x7ffff7d99700 (LWP 514079) exited]
> 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> 

I've modified the test-case to check for the amount of "Couldn't get
registers: No such process." between the continue and the following gdb
prompt, and handle that appropriately. Hopefully this will fix the FAILs
you're seeing.

> I didn't really understand why we saw a prompt coming back before the other messages,
> so I looked into it a bit and this is what I think happens:
> 
> 1. While we are stopped at the prompt, the linux-nat event handler is unregistered from the event loop
> 2. Inferior gets SIGKILL'ed, so GDB gets SIGCHLD'ed, that posts an event to the event pipe, but since it's
>    not registered in the event loop, nothing happens
> 3. User does continue, the linux-nat event handler gets registered with the event loop.  Then, the continue
>    fails (No such process), which brings us back at the prompt.  However, the linux-nat event handler stays
>    registered with the event loop.
> 4. When we come back to the event loop, we process the event for the SIGKILL, which makes GDB print the
>    thread exit message and "Program terminated" message.
> 
> Normally, after the "continue" fails, I don't think we would want to leave the linux-nat
> handler registered with the event loop: if it was not registered before, why should it be
> registered after?  However, if it wasn't left there, we wouldn't see the messages saying
> the program has terminated, so that wouldn't be good either.
> 
> Maybe there's a better way to handle it?
> 

Thanks for the investigation. Unfortunately I'm not able to comment on this.

> However, I still think it would be a good idea to merge a patch like yours, it's already a
> step forward (especially since it fixes a regression).
> 
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index 81518b9646..745694df2d 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -571,7 +571,7 @@ proc runto { function args } {
>>  	    }
>>  	    return 1
>>  	}
>> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
>> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>>  	    if { $print_pass } {
>>  		pass $test_name
>>  	    }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index 54b59e2244..9876ca3c76 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
>>  
>>  scoped_restore_current_thread::~scoped_restore_current_thread ()
>>  {
>> +  if (m_inf == NULL)
>> +    return;
>>    if (!m_dont_restore)
>>      {
>>        try
>> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>>        else
>>  	frame = NULL;
>>  
>> -      m_selected_frame_id = get_frame_id (frame);
>> +      try
>> +       {
>> +	 m_selected_frame_id = get_frame_id (frame);
>> +       }
>> +      catch (const gdb_exception &ex)
>> +       {
>> +	 m_inf = NULL;
>> +	 m_selected_frame_id = null_frame_id;
>> +	 m_selected_frame_level = -1;
>> +	 return;
>> +       }
> 
> The indentation is a bit off here.
> 
> Instead of clearing everything, I think we should just set m_selected_frame_id to
> null_frame_id and m_selected_frame_level to -1.  m_inf and m_thread can still be
> set.  This way, the right inferior will be restored, at least, I think this is
> desirable.  scoped_restore_current_thread::restore handles well the case where the
> thread to restore has exited.  The thread_info object is refcounted for this exact
> use case, where the thread would get deleted while
> 
> In other words:
> 
>       try
> 	{
> 	  m_selected_frame_id = get_frame_id (frame);
> 	  m_selected_frame_level = frame_relative_level (frame);
> 	}
>       catch (const gdb_exception &ex)
> 	{
> 	  m_selected_frame_id = null_frame_id;
> 	  m_selected_frame_level = -1;
> 	}

Yep, that works.

Here's the updated patch.

Thanks,
- Tom

[-- Attachment #2: 0001-gdb-Fix-hang-after-ext-sigkill.patch --]
[-- Type: text/x-patch, Size: 7593 bytes --]

[gdb] Fix hang after ext sigkill

Consider the test-case from this patch, compiled with pthread support:
...
$ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
...

After running, the program sleeps:
...
$ gdb a.out
Reading symbols from a.out...
(gdb) r
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fe700 (LWP 22604)]
...

Until we interrupt it with a control-C:
...
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
(gdb)
...

If we then kill the inferior using an external SIGKILL:
...
(gdb) shell killall -s SIGKILL a.out
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs repeating the same warning.  Typing control-C no longer helps,
and we have to kill gdb.

This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution".  The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.

Fix this by catching the exception in the constructor.

Build and reg-tested on x86_64-linux.

gdb/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* thread.c
	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
	exception in get_frame_id.

gdb/testsuite/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* gdb.threads/hang-after-ext-sigkill.c: New test.
	* gdb.threads/hang-after-ext-sigkill.exp: New file.
	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.

---
 gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++
 .../gdb.threads/hang-after-ext-sigkill.exp         | 85 ++++++++++++++++++++++
 gdb/testsuite/lib/gdb.exp                          |  2 +-
 gdb/thread.c                                       | 12 ++-
 4 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
new file mode 100644
index 0000000000..bfce6c3085
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
@@ -0,0 +1,40 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+static void *
+fun (void *dummy)
+{
+  while (1)
+    sleep (1);
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  pthread_t thread;
+  pthread_create (&thread, NULL, fun, NULL);
+
+  while (1)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
new file mode 100644
index 0000000000..37577592cd
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
@@ -0,0 +1,85 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 {pthreads}] == -1} {
+    return -1
+}
+
+set res [runto main no-message temporary]
+if { $res != 1 } {
+    return -1
+}
+
+set pid -1
+gdb_test_multiple "info inferior 1" "get inferior pid" {
+    -re -wrap "process (\[0-9\]*).*" {
+       set pid $expect_out(1,string)
+       pass $gdb_test_name
+    }
+}
+if { $pid == -1 } {
+    return -1
+}
+
+gdb_test_multiple "continue" "" {
+    -re "Continuing" {
+	pass $gdb_test_name
+    }
+}
+
+send_gdb "\003"
+
+gdb_test_multiple "" "get sigint" {
+    -re -wrap "received signal SIGINT, Interrupt\..*" {
+       pass $gdb_test_name
+   }
+}
+
+gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
+
+set no_such_process_msg "Couldn't get registers: No such process\."
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+
+gdb_test_multiple "continue" "prompt after first continue" {
+    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+	# Regular output, bug condition was not triggered, we're done.
+	return -1
+    }
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
+	pass $gdb_test_name
+	# Two times $no_such_process_msg.  The bug condition was triggered, go
+	# check for it.
+    }
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+	# One time $no_such_process_msg.  We're stuck here.  The bug condition
+	# was not triggered, but we're not getting correct gdb behaviour either:
+	# every subsequent continue produces one no_such_process_msg.  Give up.
+	return -1
+    }
+}
+
+gdb_test_multiple "" "messages" {
+    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
+	pass $gdb_test_name
+	gdb_test "continue" $not_being_run_msg "second continue"
+    }
+}
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index e17ac0ef75..4cf2beca00 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -570,7 +570,7 @@ proc runto { function args } {
 	    }
 	    return 1
 	}
-	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
+	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
 	    if { $print_pass } {
 		pass $test_name
 	    }
diff --git a/gdb/thread.c b/gdb/thread.c
index c6e3d356a5..d287bce45f 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
       else
 	frame = NULL;
 
-      m_selected_frame_id = get_frame_id (frame);
-      m_selected_frame_level = frame_relative_level (frame);
+      try
+	{
+	  m_selected_frame_id = get_frame_id (frame);
+	  m_selected_frame_level = frame_relative_level (frame);
+	}
+      catch (const gdb_exception &ex)
+	{
+	  m_selected_frame_id = null_frame_id;
+	  m_selected_frame_level = -1;
+	}
 
       tp->incref ();
       m_thread = tp;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-03-25 10:29   ` Tom de Vries
@ 2020-03-25 14:44     ` Simon Marchi
  2020-03-25 15:51       ` Tom de Vries
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Marchi @ 2020-03-25 14:44 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches; +Cc: Pedro Alves

On 2020-03-25 6:29 a.m., Tom de Vries wrote:
> Here's the updated patch.

Thanks.  Some comments about the test:

- Please add a comment at the top to describe briefly what this is testing.
- Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
  so that the test program eventually exits if something goes wrong and it is allowed to run
  freely.

Simon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-03-25 14:44     ` Simon Marchi
@ 2020-03-25 15:51       ` Tom de Vries
  2020-03-25 15:57         ` Simon Marchi
  2020-04-16 13:28         ` Pedro Alves
  0 siblings, 2 replies; 12+ messages in thread
From: Tom de Vries @ 2020-03-25 15:51 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Pedro Alves

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]

On 25-03-2020 15:44, Simon Marchi wrote:
> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>> Here's the updated patch.
> 
> Thanks.  Some comments about the test:
> 
> - Please add a comment at the top to describe briefly what this is testing.
> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>   so that the test program eventually exits if something goes wrong and it is allowed to run
>   freely.

Done.

Thanks,
- Tom



[-- Attachment #2: 0001-gdb-Fix-hang-after-ext-sigkill.patch --]
[-- Type: text/x-patch, Size: 7776 bytes --]

[gdb] Fix hang after ext sigkill

Consider the test-case from this patch, compiled with pthread support:
...
$ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
...

After running, the program sleeps:
...
$ gdb a.out
Reading symbols from a.out...
(gdb) r
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fe700 (LWP 22604)]
...

Until we interrupt it with a control-C:
...
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
(gdb)
...

If we then kill the inferior using an external SIGKILL:
...
(gdb) shell killall -s SIGKILL a.out
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs repeating the same warning.  Typing control-C no longer helps,
and we have to kill gdb.

This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution".  The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.

Fix this by catching the exception in the constructor.

Build and reg-tested on x86_64-linux.

gdb/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* thread.c
	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
	exception in get_frame_id.

gdb/testsuite/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* gdb.threads/hang-after-ext-sigkill.c: New test.
	* gdb.threads/hang-after-ext-sigkill.exp: New file.
	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.

---
 gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
 .../gdb.threads/hang-after-ext-sigkill.exp         | 88 ++++++++++++++++++++++
 gdb/testsuite/lib/gdb.exp                          |  2 +-
 gdb/thread.c                                       | 12 ++-
 4 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
new file mode 100644
index 0000000000..b93d6c644a
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
@@ -0,0 +1,43 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+static void *
+fun (void *dummy)
+{
+  int i;
+
+  for (i = 0; i < 300; i++)
+    sleep (1);
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  int i;
+  pthread_t thread;
+  pthread_create (&thread, NULL, fun, NULL);
+
+  for (i = 0; i < 300; i++)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
new file mode 100644
index 0000000000..89b38b1f6c
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
@@ -0,0 +1,88 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test-case tests that continuing an inferior that has been killed
+# using an external sigkill does not make gdb hang.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 {pthreads}] == -1} {
+    return -1
+}
+
+set res [runto main no-message temporary]
+if { $res != 1 } {
+    return -1
+}
+
+set pid -1
+gdb_test_multiple "info inferior 1" "get inferior pid" {
+    -re -wrap "process (\[0-9\]*).*" {
+       set pid $expect_out(1,string)
+       pass $gdb_test_name
+    }
+}
+if { $pid == -1 } {
+    return -1
+}
+
+gdb_test_multiple "continue" "" {
+    -re "Continuing" {
+	pass $gdb_test_name
+    }
+}
+
+send_gdb "\003"
+
+gdb_test_multiple "" "get sigint" {
+    -re -wrap "received signal SIGINT, Interrupt\..*" {
+       pass $gdb_test_name
+   }
+}
+
+gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
+
+set no_such_process_msg "Couldn't get registers: No such process\."
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+
+gdb_test_multiple "continue" "prompt after first continue" {
+    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+	# Regular output, bug condition was not triggered, we're done.
+	return -1
+    }
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
+	pass $gdb_test_name
+	# Two times $no_such_process_msg.  The bug condition was triggered, go
+	# check for it.
+    }
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+	# One time $no_such_process_msg.  We're stuck here.  The bug condition
+	# was not triggered, but we're not getting correct gdb behaviour either:
+	# every subsequent continue produces one no_such_process_msg.  Give up.
+	return -1
+    }
+}
+
+gdb_test_multiple "" "messages" {
+    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
+	pass $gdb_test_name
+	gdb_test "continue" $not_being_run_msg "second continue"
+    }
+}
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index e17ac0ef75..4cf2beca00 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -570,7 +570,7 @@ proc runto { function args } {
 	    }
 	    return 1
 	}
-	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
+	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
 	    if { $print_pass } {
 		pass $test_name
 	    }
diff --git a/gdb/thread.c b/gdb/thread.c
index c6e3d356a5..d287bce45f 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
       else
 	frame = NULL;
 
-      m_selected_frame_id = get_frame_id (frame);
-      m_selected_frame_level = frame_relative_level (frame);
+      try
+	{
+	  m_selected_frame_id = get_frame_id (frame);
+	  m_selected_frame_level = frame_relative_level (frame);
+	}
+      catch (const gdb_exception &ex)
+	{
+	  m_selected_frame_id = null_frame_id;
+	  m_selected_frame_level = -1;
+	}
 
       tp->incref ();
       m_thread = tp;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-03-25 15:51       ` Tom de Vries
@ 2020-03-25 15:57         ` Simon Marchi
  2020-04-16 13:28         ` Pedro Alves
  1 sibling, 0 replies; 12+ messages in thread
From: Simon Marchi @ 2020-03-25 15:57 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches; +Cc: Pedro Alves

On 2020-03-25 11:51 a.m., Tom de Vries wrote:
> On 25-03-2020 15:44, Simon Marchi wrote:
>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>> Here's the updated patch.
>> Thanks.  Some comments about the test:
>>
>> - Please add a comment at the top to describe briefly what this is testing.
>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>   so that the test program eventually exits if something goes wrong and it is allowed to run
>>   freely.
> Done.
> 
> Thanks,
> - Tom
> 
> 
> 
> 0001-gdb-Fix-hang-after-ext-sigkill.patch
> 
> [gdb] Fix hang after ext sigkill
> 
> Consider the test-case from this patch, compiled with pthread support:
> ...
> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
> ...
> 
> After running, the program sleeps:
> ...
> $ gdb a.out
> Reading symbols from a.out...
> (gdb) r
> Starting program: /data/gdb_versions/devel/a.out
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff77fe700 (LWP 22604)]
> ...
> 
> Until we interrupt it with a control-C:
> ...
> ^C
> Thread 1 "a.out" received signal SIGINT, Interrupt.
> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
> (gdb)
> ...
> 
> If we then kill the inferior using an external SIGKILL:
> ...
> (gdb) shell killall -s SIGKILL a.out
> ...
> and subsequently continue:
> ...
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> <repeat>
> ...
> gdb hangs repeating the same warning.  Typing control-C no longer helps,
> and we have to kill gdb.
> 
> This is a regression since commit 873657b9e8 "Preserve selected thread in
> all-stop w/ background execution".  The commit adds a
> scoped_restore_current_thread typed variable restore_thread to
> fetch_inferior_event, and the hang is caused by the constructor throwing an
> exception.
> 
> Fix this by catching the exception in the constructor.
> 
> Build and reg-tested on x86_64-linux.
> 
> gdb/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* thread.c
> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
> 	exception in get_frame_id.
> 
> gdb/testsuite/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.
> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.

This is fine with me, but I'd prefer Pedro to take a look if he has time.  Given the current
situation, give him a few weeks, if you don't have any answers by then, then push it.

Simon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-03-25 15:51       ` Tom de Vries
  2020-03-25 15:57         ` Simon Marchi
@ 2020-04-16 13:28         ` Pedro Alves
  2020-04-21 12:38           ` Tom de Vries
  1 sibling, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2020-04-16 13:28 UTC (permalink / raw)
  To: Tom de Vries, Simon Marchi, gdb-patches

Hi,

Sorry for the delay, and thanks much for working on this.

On 3/25/20 3:51 PM, Tom de Vries wrote:
> On 25-03-2020 15:44, Simon Marchi wrote:
>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>> Here's the updated patch.
>> Thanks.  Some comments about the test:
>>
>> - Please add a comment at the top to describe briefly what this is testing.
>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>   so that the test program eventually exits if something goes wrong and it is allowed to run
>>   freely.
> Done.
> 0001-gdb-Fix-hang-after-ext-sigkill.patch
> 
> [gdb] Fix hang after ext sigkill
> 
> Consider the test-case from this patch, compiled with pthread support:
> ...
> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
> ...
> 
> After running, the program sleeps:
> ...
> $ gdb a.out
> Reading symbols from a.out...
> (gdb) r
> Starting program: /data/gdb_versions/devel/a.out
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff77fe700 (LWP 22604)]
> ...
> 
> Until we interrupt it with a control-C:
> ...
> ^C
> Thread 1 "a.out" received signal SIGINT, Interrupt.
> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
> (gdb)
> ...
> 
> If we then kill the inferior using an external SIGKILL:
> ...
> (gdb) shell killall -s SIGKILL a.out
> ...
> and subsequently continue:
> ...
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> <repeat>
> ...
> gdb hangs repeating the same warning.  Typing control-C no longer helps,
> and we have to kill gdb.
> 
> This is a regression since commit 873657b9e8 "Preserve selected thread in
> all-stop w/ background execution".  The commit adds a
> scoped_restore_current_thread typed variable restore_thread to
> fetch_inferior_event, and the hang is caused by the constructor throwing an
> exception.
> 
> Fix this by catching the exception in the constructor.
> 
> Build and reg-tested on x86_64-linux.
> 
> gdb/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* thread.c
> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
> 	exception in get_frame_id.
> 
> gdb/testsuite/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.

"hang-after-ext-sigkill" is named in terms of how the
bug manifested (a hang), but once the bug is fixed, it won't
be obvious to remember to look for "hang" when someone goes
look for a testcase related to a process being killed outside of
gdb's control.  Plus, then testcase may be extended in the future
for related bugs that do not cause a hang.

There's a gdb.base/killed-outside.exp testcase already exactly for
this sort of issue -- the testcase does the same thing with killing
with SIGKILL from outside, and then making sure that GDB
behaves.  I'd rather this new testcase was given the same or
a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"'
runs it too.  Or maybe merge the testcases, though it's useful
to run the existing one on non-threaded environments too.
But I'm not sure this one needs to be threaded at all.  Won't
we see the failure to read registers with a single-threaded program
too?

> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
> 
> ---
>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
>  .../gdb.threads/hang-after-ext-sigkill.exp         | 88 ++++++++++++++++++++++
>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>  gdb/thread.c                                       | 12 ++-
>  4 files changed, 142 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
> new file mode 100644
> index 0000000000..b93d6c644a
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
> @@ -0,0 +1,43 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2020 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <pthread.h>
> +#include <unistd.h>
> +
> +static void *
> +fun (void *dummy)
> +{
> +  int i;
> +
> +  for (i = 0; i < 300; i++)
> +    sleep (1);
> +
> +  return NULL;
> +}
> +
> +int
> +main (void)
> +{
> +  int i;
> +  pthread_t thread;
> +  pthread_create (&thread, NULL, fun, NULL);
> +
> +  for (i = 0; i < 300; i++)
> +    sleep (1);
> +
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
> new file mode 100644
> index 0000000000..89b38b1f6c
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
> @@ -0,0 +1,88 @@
> +# Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# This test-case tests that continuing an inferior that has been killed
> +# using an external sigkill does not make gdb hang.
> +
> +standard_testfile
> +
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
> +	 {pthreads}] == -1} {
> +    return -1
> +}
> +
> +set res [runto main no-message temporary]
> +if { $res != 1 } {
> +    return -1
> +}
> +
> +set pid -1
> +gdb_test_multiple "info inferior 1" "get inferior pid" {
> +    -re -wrap "process (\[0-9\]*).*" {
> +       set pid $expect_out(1,string)
> +       pass $gdb_test_name
> +    }
> +}

This won't work with remote targets that don't support
the process extensions, since on that case, you'll
get "Remote target" instead of "process $PID".  See
remote_target::pid_to_str.  Likewise probably other
targets.  See gdb.base/killed-outside.exp.


> +if { $pid == -1 } {
> +    return -1
> +}
> +
> +gdb_test_multiple "continue" "" {
> +    -re "Continuing" {
> +	pass $gdb_test_name
> +    }
> +}
> +
> +send_gdb "\003"
> +
> +gdb_test_multiple "" "get sigint" {
> +    -re -wrap "received signal SIGINT, Interrupt\..*" {
> +       pass $gdb_test_name
> +   }
> +}
> +


I don't think interrupting with Ctrl-C is really important,
compared to e.g., running to a breakpoint.  I'd prefer to run
to a breakpoint instead.  E.g., call a all_started(); function
after the child thread is spawned and after a pthread_barrier_wait
call, to make sure the child was scheduled.
See e.g., gdb.threads/async.c.  Then all you need is to
"runto all_started" instead of runto_main.

> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"

This will always kill a process on the host gdb is running on, which
of course does wrong the thing in cross scenarios.  So this should
do instead:

    remote_exec target "kill -9 ${testpid}"

and sleeping a bit is a good idea to make sure the kill is actually
scheduled and does its thing before gdb does the "continue".
See gdb.base/killed-outside.exp.


> +
> +set no_such_process_msg "Couldn't get registers: No such process\."
> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
> +set no_longer_exists_msg "The program no longer exists\."
> +set not_being_run_msg "The program is not being run\."
> +
> +gdb_test_multiple "continue" "prompt after first continue" {
> +    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
> +	pass $gdb_test_name
> +	# Regular output, bug condition was not triggered, we're done.
> +	return -1
> +    }
> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
> +	pass $gdb_test_name
> +	# Two times $no_such_process_msg.  The bug condition was triggered, go
> +	# check for it.
> +    }
> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
> +	pass $gdb_test_name
> +	# One time $no_such_process_msg.  We're stuck here.  The bug condition
> +	# was not triggered, but we're not getting correct gdb behaviour either:
> +	# every subsequent continue produces one no_such_process_msg.  Give up.
> +	return -1

I'm confused here -- the comment says we're not getting correct behavior,
but this won't result in any FAIL?

> +    }
> +}
> +
> +gdb_test_multiple "" "messages" {
> +    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
> +	pass $gdb_test_name
> +	gdb_test "continue" $not_being_run_msg "second continue"
> +    }
> +}

It isn't obvious to me why put this one separately instead of
nested within the pass case in the other gdb_test_multiple above.
Is this also meant to run if the previous gdb_test_multiple fails
due to an internal gdb_test_multiple case being hit,
like the default gdb_prompt match?


> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> index e17ac0ef75..4cf2beca00 100644
> --- a/gdb/testsuite/lib/gdb.exp
> +++ b/gdb/testsuite/lib/gdb.exp
> @@ -570,7 +570,7 @@ proc runto { function args } {
>  	    }
>  	    return 1
>  	}
> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>  	    if { $print_pass } {
>  		pass $test_name
>  	    }
> diff --git a/gdb/thread.c b/gdb/thread.c
> index c6e3d356a5..d287bce45f 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>        else
>  	frame = NULL;
>  
> -      m_selected_frame_id = get_frame_id (frame);
> -      m_selected_frame_level = frame_relative_level (frame);
> +      try
> +	{
> +	  m_selected_frame_id = get_frame_id (frame);
> +	  m_selected_frame_level = frame_relative_level (frame);
> +	}
> +      catch (const gdb_exception &ex)

This silently swallows Ctrl-C/QUIT too.  That's usually not a good
idea.  gdb_exception_error should be the default choice, unless you
really want to handle Ctrl-C here.

Thanks,
Pedro Alves


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-04-16 13:28         ` Pedro Alves
@ 2020-04-21 12:38           ` Tom de Vries
  2020-04-21 13:42             ` Pedro Alves
  2020-09-25  9:39             ` Tom de Vries
  0 siblings, 2 replies; 12+ messages in thread
From: Tom de Vries @ 2020-04-21 12:38 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 12258 bytes --]

On 16-04-2020 15:28, Pedro Alves wrote:
> Hi,
> 
> Sorry for the delay, and thanks much for working on this.
> 
> On 3/25/20 3:51 PM, Tom de Vries wrote:
>> On 25-03-2020 15:44, Simon Marchi wrote:
>>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>>> Here's the updated patch.
>>> Thanks.  Some comments about the test:
>>>
>>> - Please add a comment at the top to describe briefly what this is testing.
>>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>>   so that the test program eventually exits if something goes wrong and it is allowed to run
>>>   freely.
>> Done.
>> 0001-gdb-Fix-hang-after-ext-sigkill.patch
>>
>> [gdb] Fix hang after ext sigkill
>>
>> Consider the test-case from this patch, compiled with pthread support:
>> ...
>> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
>> ...
>>
>> After running, the program sleeps:
>> ...
>> $ gdb a.out
>> Reading symbols from a.out...
>> (gdb) r
>> Starting program: /data/gdb_versions/devel/a.out
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New Thread 0x7ffff77fe700 (LWP 22604)]
>> ...
>>
>> Until we interrupt it with a control-C:
>> ...
>> ^C
>> Thread 1 "a.out" received signal SIGINT, Interrupt.
>> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
>> (gdb)
>> ...
>>
>> If we then kill the inferior using an external SIGKILL:
>> ...
>> (gdb) shell killall -s SIGKILL a.out
>> ...
>> and subsequently continue:
>> ...
>> (gdb) c
>> Continuing.
>> Couldn't get registers: No such process.
>> Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> <repeat>
>> ...
>> gdb hangs repeating the same warning.  Typing control-C no longer helps,
>> and we have to kill gdb.
>>
>> This is a regression since commit 873657b9e8 "Preserve selected thread in
>> all-stop w/ background execution".  The commit adds a
>> scoped_restore_current_thread typed variable restore_thread to
>> fetch_inferior_event, and the hang is caused by the constructor throwing an
>> exception.
>>
>> Fix this by catching the exception in the constructor.
>>
>> Build and reg-tested on x86_64-linux.
>>
>> gdb/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* thread.c
>> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
>> 	exception in get_frame_id.
>>
>> gdb/testsuite/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
>> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.
> 
> "hang-after-ext-sigkill" is named in terms of how the
> bug manifested (a hang), but once the bug is fixed, it won't
> be obvious to remember to look for "hang" when someone goes
> look for a testcase related to a process being killed outside of
> gdb's control.  Plus, then testcase may be extended in the future
> for related bugs that do not cause a hang.
> 
> There's a gdb.base/killed-outside.exp testcase already exactly for
> this sort of issue -- the testcase does the same thing with killing
> with SIGKILL from outside, and then making sure that GDB
> behaves.  I'd rather this new testcase was given the same or
> a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"'
> runs it too.

Ack, test-case renamed to gdb.threads/killed-outside.{exp,c}.

>  Or maybe merge the testcases, though it's useful
> to run the existing one on non-threaded environments too.
> But I'm not sure this one needs to be threaded at all.  Won't
> we see the failure to read registers with a single-threaded program
> too?
> 

I was not able to reproduce the hang with a single-threaded program.

>> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
>>
>> ---
>>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
>>  .../gdb.threads/hang-after-ext-sigkill.exp         | 88 ++++++++++++++++++++++
>>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>>  gdb/thread.c                                       | 12 ++-
>>  4 files changed, 142 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> new file mode 100644
>> index 0000000000..b93d6c644a
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> @@ -0,0 +1,43 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> +   Copyright 2020 Free Software Foundation, Inc.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <pthread.h>
>> +#include <unistd.h>
>> +
>> +static void *
>> +fun (void *dummy)
>> +{
>> +  int i;
>> +
>> +  for (i = 0; i < 300; i++)
>> +    sleep (1);
>> +
>> +  return NULL;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int i;
>> +  pthread_t thread;
>> +  pthread_create (&thread, NULL, fun, NULL);
>> +
>> +  for (i = 0; i < 300; i++)
>> +    sleep (1);
>> +
>> +  return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> new file mode 100644
>> index 0000000000..89b38b1f6c
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> @@ -0,0 +1,88 @@
>> +# Copyright (C) 2020 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# This test-case tests that continuing an inferior that has been killed
>> +# using an external sigkill does not make gdb hang.
>> +
>> +standard_testfile
>> +
>> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
>> +	 {pthreads}] == -1} {
>> +    return -1
>> +}
>> +
>> +set res [runto main no-message temporary]
>> +if { $res != 1 } {
>> +    return -1
>> +}
>> +
>> +set pid -1
>> +gdb_test_multiple "info inferior 1" "get inferior pid" {
>> +    -re -wrap "process (\[0-9\]*).*" {
>> +       set pid $expect_out(1,string)
>> +       pass $gdb_test_name
>> +    }
>> +}
> 
> This won't work with remote targets that don't support
> the process extensions, since on that case, you'll
> get "Remote target" instead of "process $PID".  See
> remote_target::pid_to_str.  Likewise probably other
> targets.  See gdb.base/killed-outside.exp.
> 
> 

Ack, updated accordingly.

>> +if { $pid == -1 } {
>> +    return -1
>> +}
>> +
>> +gdb_test_multiple "continue" "" {
>> +    -re "Continuing" {
>> +	pass $gdb_test_name
>> +    }
>> +}
>> +
>> +send_gdb "\003"
>> +
>> +gdb_test_multiple "" "get sigint" {
>> +    -re -wrap "received signal SIGINT, Interrupt\..*" {
>> +       pass $gdb_test_name
>> +   }
>> +}
>> +
> 
> 
> I don't think interrupting with Ctrl-C is really important,
> compared to e.g., running to a breakpoint.  I'd prefer to run
> to a breakpoint instead.  E.g., call a all_started(); function
> after the child thread is spawned and after a pthread_barrier_wait
> call, to make sure the child was scheduled.
> See e.g., gdb.threads/async.c.  Then all you need is to
> "runto all_started" instead of runto_main.
> 

Done.

>> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
> 
> This will always kill a process on the host gdb is running on, which
> of course does wrong the thing in cross scenarios.  So this should
> do instead:
> 
>     remote_exec target "kill -9 ${testpid}"
> 
> and sleeping a bit is a good idea to make sure the kill is actually
> scheduled and does its thing before gdb does the "continue".
> See gdb.base/killed-outside.exp.
> 
> 

Ack, done.

>> +
>> +set no_such_process_msg "Couldn't get registers: No such process\."
>> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
>> +set no_longer_exists_msg "The program no longer exists\."
>> +set not_being_run_msg "The program is not being run\."
>> +
>> +gdb_test_multiple "continue" "prompt after first continue" {
>> +    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
>> +	pass $gdb_test_name
>> +	# Regular output, bug condition was not triggered, we're done.
>> +	return -1
>> +    }
>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
>> +	pass $gdb_test_name
>> +	# Two times $no_such_process_msg.  The bug condition was triggered, go
>> +	# check for it.
>> +    }
>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
>> +	pass $gdb_test_name
>> +	# One time $no_such_process_msg.  We're stuck here.  The bug condition
>> +	# was not triggered, but we're not getting correct gdb behaviour either:
>> +	# every subsequent continue produces one no_such_process_msg.  Give up.
>> +	return -1
> 
> I'm confused here -- the comment says we're not getting correct behavior,
> but this won't result in any FAIL?
> 

Um, yes. I made that decision because I was trying to trigger another
scenario. But agreed, it's debatable. Anyway, it also doesn't matter
anymore since rewriting the test to stop at a breakpoint eliminated the
need for this. The scenario I was trying to trigger now always reproduces.

>> +    }
>> +}
>> +
>> +gdb_test_multiple "" "messages" {
>> +    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
>> +	pass $gdb_test_name
>> +	gdb_test "continue" $not_being_run_msg "second continue"
>> +    }
>> +}
> 
> It isn't obvious to me why put this one separately instead of
> nested within the pass case in the other gdb_test_multiple above.
> Is this also meant to run if the previous gdb_test_multiple fails
> due to an internal gdb_test_multiple case being hit,
> like the default gdb_prompt match?
> 

Fixed.

> 
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index e17ac0ef75..4cf2beca00 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -570,7 +570,7 @@ proc runto { function args } {
>>  	    }
>>  	    return 1
>>  	}
>> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
>> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>>  	    if { $print_pass } {
>>  		pass $test_name
>>  	    }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index c6e3d356a5..d287bce45f 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>>        else
>>  	frame = NULL;
>>  
>> -      m_selected_frame_id = get_frame_id (frame);
>> -      m_selected_frame_level = frame_relative_level (frame);
>> +      try
>> +	{
>> +	  m_selected_frame_id = get_frame_id (frame);
>> +	  m_selected_frame_level = frame_relative_level (frame);
>> +	}
>> +      catch (const gdb_exception &ex)
> 
> This silently swallows Ctrl-C/QUIT too.  That's usually not a good
> idea.  gdb_exception_error should be the default choice, unless you
> really want to handle Ctrl-C here.
> 

Fixed.

Thanks for the review, that was helpful.

Any more comments?

- Tom

[-- Attachment #2: 0001-gdb-Fix-hang-after-ext-sigkill.patch --]
[-- Type: text/x-patch, Size: 6699 bytes --]

[gdb] Fix hang after ext sigkill

Consider the test-case from this patch, compiled with pthread support:
...
$ gcc gdb/testsuite/gdb.threads/killed-outside.c -lpthread -g
...

After running to all_started, we can print pid:
...
$ gdb a.out -ex "b all_started" -ex run -ex "delete 1" -ex "p pid"
...
Reading symbols from a.out...
Breakpoint 1 at 0x40072b: file killed-outside.c, line 29.
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fc700 (LWP 3155)]

Thread 1 "a.out" hit Breakpoint 1, all_started () at killed-outside.c:29
29      }
$1 = 3151
(gdb)
...

If we then kill the inferior using an external SIGKILL:
...
(gdb) shell kill -9 3151
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs repeating the same warning.  Typing control-C no longer helps,
and we have to kill gdb.

This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution".  The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.

Fix this by catching the exception in the constructor.

Build and reg-tested on x86_64-linux.

gdb/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* thread.c
	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
	exception in get_frame_id.

gdb/testsuite/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
        * gdb.threads/killed-outside.c: New test.
        * gdb.threads/killed-outside.exp: New file.

---
 gdb/testsuite/gdb.threads/killed-outside.c   | 64 ++++++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/killed-outside.exp | 57 +++++++++++++++++++++++++
 gdb/thread.c                                 | 12 +++++-
 3 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/killed-outside.c b/gdb/testsuite/gdb.threads/killed-outside.c
new file mode 100644
index 0000000000..ac4cf0b07a
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/killed-outside.c
@@ -0,0 +1,64 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+pid_t pid;
+
+static pthread_barrier_t threads_started_barrier;
+
+static void
+all_started (void)
+{
+}
+
+static void *
+fun (void *dummy)
+{
+  int i;
+
+  pthread_barrier_wait (&threads_started_barrier);
+
+  for (i = 0; i < 180; i++)
+    sleep (1);
+
+  pthread_exit (NULL);
+}
+
+int
+main (void)
+{
+  int i;
+  pthread_t thread;
+
+  pid = getpid ();
+
+  pthread_barrier_init (&threads_started_barrier, NULL, 2);
+
+  pthread_create (&thread, NULL, fun, NULL);
+
+  pthread_barrier_wait (&threads_started_barrier);
+
+  all_started ();
+
+  for (i = 0; i < 180; i++)
+    sleep (1);
+
+  exit (EXIT_SUCCESS);
+}
diff --git a/gdb/testsuite/gdb.threads/killed-outside.exp b/gdb/testsuite/gdb.threads/killed-outside.exp
new file mode 100644
index 0000000000..ff5a115728
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/killed-outside.exp
@@ -0,0 +1,57 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test-case tests that continuing an inferior that has been killed
+# using an external sigkill does not make gdb misbehave.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 {pthreads debug}] == -1} {
+    return -1
+}
+
+if { ![runto "all_started"] } {
+    return -1
+}
+delete_breakpoints
+
+set testpid [get_valueof "" "pid" -1 "get pid of inferior"]
+if { $testpid == -1 } {
+    return -1
+}
+remote_exec target "kill -9 ${testpid}"
+
+# Give it some time to die.
+sleep 2
+
+set no_such_process_msg "Couldn't get registers: No such process\."
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+
+gdb_test_multiple "continue" "prompt after first continue" {
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
+	pass $gdb_test_name
+	# Two times $no_such_process_msg.  The bug condition was triggered, go
+	# check for it.
+	gdb_test_multiple "" "messages" {
+	    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
+		pass $gdb_test_name
+		gdb_test "continue" $not_being_run_msg "second continue"
+	    }
+	}
+    }
+}
diff --git a/gdb/thread.c b/gdb/thread.c
index c6e3d356a5..03805bd256 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
       else
 	frame = NULL;
 
-      m_selected_frame_id = get_frame_id (frame);
-      m_selected_frame_level = frame_relative_level (frame);
+      try
+	{
+	  m_selected_frame_id = get_frame_id (frame);
+	  m_selected_frame_level = frame_relative_level (frame);
+	}
+      catch (const gdb_exception_error &ex)
+	{
+	  m_selected_frame_id = null_frame_id;
+	  m_selected_frame_level = -1;
+	}
 
       tp->incref ();
       m_thread = tp;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-04-21 12:38           ` Tom de Vries
@ 2020-04-21 13:42             ` Pedro Alves
  2020-09-25  9:39             ` Tom de Vries
  1 sibling, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2020-04-21 13:42 UTC (permalink / raw)
  To: Tom de Vries, Simon Marchi, gdb-patches

On 4/21/20 1:38 PM, Tom de Vries wrote:
> On 16-04-2020 15:28, Pedro Alves wrote
> 
> Thanks for the review, that was helpful.
> 
> Any more comments?
> 

Nope, this version looks good to me.

> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
>         * gdb.threads/killed-outside.c: New test.
>         * gdb.threads/killed-outside.exp: New file.

Note tabs vs spaces here.

Thanks,
Pedro Alves


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH][gdb] Fix hang after ext sigkill
  2020-04-21 12:38           ` Tom de Vries
  2020-04-21 13:42             ` Pedro Alves
@ 2020-09-25  9:39             ` Tom de Vries
  1 sibling, 0 replies; 12+ messages in thread
From: Tom de Vries @ 2020-09-25  9:39 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, gdb-patches

On 4/21/20 2:38 PM, Tom de Vries wrote:
>>> +
>>> +set no_such_process_msg "Couldn't get registers: No such process\."
>>> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
>>> +set no_longer_exists_msg "The program no longer exists\."
>>> +set not_being_run_msg "The program is not being run\."
>>> +
>>> +gdb_test_multiple "continue" "prompt after first continue" {
>>> +    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
>>> +	pass $gdb_test_name
>>> +	# Regular output, bug condition was not triggered, we're done.
>>> +	return -1
>>> +    }
>>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
>>> +	pass $gdb_test_name
>>> +	# Two times $no_such_process_msg.  The bug condition was triggered, go
>>> +	# check for it.
>>> +    }
>>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
>>> +	pass $gdb_test_name
>>> +	# One time $no_such_process_msg.  We're stuck here.  The bug condition
>>> +	# was not triggered, but we're not getting correct gdb behaviour either:
>>> +	# every subsequent continue produces one no_such_process_msg.  Give up.
>>> +	return -1
>> I'm confused here -- the comment says we're not getting correct behavior,
>> but this won't result in any FAIL?
>>
> Um, yes. I made that decision because I was trying to trigger another
> scenario. But agreed, it's debatable. Anyway, it also doesn't matter
> anymore since rewriting the test to stop at a breakpoint eliminated the
> need for this. The scenario I was trying to trigger now always reproduces.
> 

I actually ran into the "One time $no_such_process_msg." scenario again,
filed at https://sourceware.org/bugzilla/show_bug.cgi?id=26664 .

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-25  9:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24 20:14 [PATCH][gdb] Fix hang after ext sigkill Tom de Vries
2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
2020-03-23 19:16   ` [PING^2][PATCH][gdb] " Tom de Vries
2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
2020-03-25 10:29   ` Tom de Vries
2020-03-25 14:44     ` Simon Marchi
2020-03-25 15:51       ` Tom de Vries
2020-03-25 15:57         ` Simon Marchi
2020-04-16 13:28         ` Pedro Alves
2020-04-21 12:38           ` Tom de Vries
2020-04-21 13:42             ` Pedro Alves
2020-09-25  9:39             ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).