public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Fix crash of gdbserver when kill threads
@ 2014-06-23 10:10 Hui Zhu
  2014-06-29  3:28 ` Hui Zhu
  0 siblings, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2014-06-23 10:10 UTC (permalink / raw)
  To: gdb-patches ml

gdbserver :1234 gdb.base/watch_thread_num
gdb gdb.base/watch_thread_num
(gdb) b 48
Breakpoint 1 at 0x400737: file ../../../binutils-gdb/gdb/testsuite/gdb.base/watch_thread_num.c, line 48.
(gdb) c
Continuing.

Breakpoint 1, main () at ../../../binutils-gdb/gdb/testsuite/gdb.base/watch_thread_num.c:48
48	    thread_result = thread_function ((void *) i);
(gdb) k
Kill the program being debugged? (y or n) y
gdbserver :1234 gdb.base/watch_thread_num
Process gdb.base/watch_thread_num created; pid = 9719
Listening on port 1234
Remote debugging from host 127.0.0.1
Killing all inferiors
Segmentation fault (core dumped)

Backtrace:
(gdb) bt
#0  find_inferior (list=<optimized out>, func=func@entry=0x423990 <kill_one_lwp_callback>, arg=arg@entry=0x7fffe97405dc)
    at ../../../binutils-gdb/gdb/gdbserver/inferiors.c:199
#1  0x0000000000425bff in linux_kill (pid=10130) at ../../../binutils-gdb/gdb/gdbserver/linux-low.c:966
#2  0x000000000040ae8c in kill_inferior_callback (entry=<optimized out>) at ../../../binutils-gdb/gdb/gdbserver/server.c:2934
#3  0x0000000000405c61 in for_each_inferior (list=<optimized out>, action=action@entry=0x40ae60 <kill_inferior_callback>)
    at ../../../binutils-gdb/gdb/gdbserver/inferiors.c:57
#4  0x000000000040d5e2 in process_serial_event () at ../../../binutils-gdb/gdb/gdbserver/server.c:3767
#5  handle_serial_event (err=<optimized out>, client_data=<optimized out>) at ../../../binutils-gdb/gdb/gdbserver/server.c:3880
#6  0x0000000000412cda in handle_file_event (event_file_desc=event_file_desc@entry=4)
    at ../../../binutils-gdb/gdb/gdbserver/event-loop.c:434
#7  0x000000000041357a in process_event () at ../../../binutils-gdb/gdb/gdbserver/event-loop.c:189
#8  start_event_loop () at ../../../binutils-gdb/gdb/gdbserver/event-loop.c:552
#9  0x0000000000403088 in main (argc=3, argv=0x7fffe9740938) at ../../../binutils-gdb/gdb/gdbserver/server.c:3283

The cause of this issue is when linux_kill call "find_inferior (&all_threads, kill_one_lwp_callback , &pid)"
to kill all the lwp of pid.
In linux_wait_for_event, it will delete_lwp any lwp in all_threads if it
get exit event of it.  Then it make find_inferior crash.

I make a patch that let kill_one_lwp_callback return 1, then after
linux_wait_for_event is called(Maybe all_threads is changed), find_inferior
will return.
And change call "find_inferior (&all_threads, kill_one_lwp_callback , &pid)"
to be a loop.  It will stop when all_threads doesn't have any lwp is pid.

It pass regression test in x86_64 Linux. 

Thanks,
Hui

2014-06-23  Hui Zhu  <hui@codesourcery.com>

	* linux-low.c (kill_one_lwp_callback): Change last return to 1.
	(linux_kill): Call find_inferior with a loop.

--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
       pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
     } while (pid > 0 && WIFSTOPPED (wstat));
 
-  return 0;
+  /* Let find_inferior return because maybe other lwp in the list will
+     be deleted by delete_lwp.  */
+  return 1;
 }
 
 static int
@@ -963,7 +965,9 @@ linux_kill (int pid)
      first, as PTRACE_KILL will not work otherwise.  */
   stop_all_lwps (0, NULL);
 
-  find_inferior (&all_threads, kill_one_lwp_callback , &pid);
+  /* Keep call kill_one_lwp_callback until find_inferior cannot find any
+     lwps that is for pid.  */
+  while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);
 
   /* See the comment in linux_kill_one_lwp.  We did not kill the first
      thread in the list, so do so now.  */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix crash of gdbserver when kill threads
  2014-06-23 10:10 [PATCH] Fix crash of gdbserver when kill threads Hui Zhu
@ 2014-06-29  3:28 ` Hui Zhu
  2014-07-02  9:07   ` Pedro Alves
  0 siblings, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2014-06-29  3:28 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches ml

Hi Pedro,

This issue is introduced by the patches for PR 12702 that was pushed by you.
Maybe you could take a look on this patch.

Thanks,
Hui

On 06/23/14 18:09, Hui Zhu wrote:
> gdbserver :1234 gdb.base/watch_thread_num
> gdb gdb.base/watch_thread_num
> (gdb) b 48
> Breakpoint 1 at 0x400737: file ../../../binutils-gdb/gdb/testsuite/gdb.base/watch_thread_num.c, line 48.
> (gdb) c
> Continuing.
>
> Breakpoint 1, main () at ../../../binutils-gdb/gdb/testsuite/gdb.base/watch_thread_num.c:48
> 48	    thread_result = thread_function ((void *) i);
> (gdb) k
> Kill the program being debugged? (y or n) y
> gdbserver :1234 gdb.base/watch_thread_num
> Process gdb.base/watch_thread_num created; pid = 9719
> Listening on port 1234
> Remote debugging from host 127.0.0.1
> Killing all inferiors
> Segmentation fault (core dumped)
>
> Backtrace:
> (gdb) bt
> #0  find_inferior (list=<optimized out>, func=func@entry=0x423990 <kill_one_lwp_callback>, arg=arg@entry=0x7fffe97405dc)
>      at ../../../binutils-gdb/gdb/gdbserver/inferiors.c:199
> #1  0x0000000000425bff in linux_kill (pid=10130) at ../../../binutils-gdb/gdb/gdbserver/linux-low.c:966
> #2  0x000000000040ae8c in kill_inferior_callback (entry=<optimized out>) at ../../../binutils-gdb/gdb/gdbserver/server.c:2934
> #3  0x0000000000405c61 in for_each_inferior (list=<optimized out>, action=action@entry=0x40ae60 <kill_inferior_callback>)
>      at ../../../binutils-gdb/gdb/gdbserver/inferiors.c:57
> #4  0x000000000040d5e2 in process_serial_event () at ../../../binutils-gdb/gdb/gdbserver/server.c:3767
> #5  handle_serial_event (err=<optimized out>, client_data=<optimized out>) at ../../../binutils-gdb/gdb/gdbserver/server.c:3880
> #6  0x0000000000412cda in handle_file_event (event_file_desc=event_file_desc@entry=4)
>      at ../../../binutils-gdb/gdb/gdbserver/event-loop.c:434
> #7  0x000000000041357a in process_event () at ../../../binutils-gdb/gdb/gdbserver/event-loop.c:189
> #8  start_event_loop () at ../../../binutils-gdb/gdb/gdbserver/event-loop.c:552
> #9  0x0000000000403088 in main (argc=3, argv=0x7fffe9740938) at ../../../binutils-gdb/gdb/gdbserver/server.c:3283
>
> The cause of this issue is when linux_kill call "find_inferior (&all_threads, kill_one_lwp_callback , &pid)"
> to kill all the lwp of pid.
> In linux_wait_for_event, it will delete_lwp any lwp in all_threads if it
> get exit event of it.  Then it make find_inferior crash.
>
> I make a patch that let kill_one_lwp_callback return 1, then after
> linux_wait_for_event is called(Maybe all_threads is changed), find_inferior
> will return.
> And change call "find_inferior (&all_threads, kill_one_lwp_callback , &pid)"
> to be a loop.  It will stop when all_threads doesn't have any lwp is pid.
>
> It pass regression test in x86_64 Linux.
>
> Thanks,
> Hui
>
> 2014-06-23  Hui Zhu  <hui@codesourcery.com>
>
> 	* linux-low.c (kill_one_lwp_callback): Change last return to 1.
> 	(linux_kill): Call find_inferior with a loop.
>
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
>         pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
>       } while (pid > 0 && WIFSTOPPED (wstat));
>
> -  return 0;
> +  /* Let find_inferior return because maybe other lwp in the list will
> +     be deleted by delete_lwp.  */
> +  return 1;
>   }
>
>   static int
> @@ -963,7 +965,9 @@ linux_kill (int pid)
>        first, as PTRACE_KILL will not work otherwise.  */
>     stop_all_lwps (0, NULL);
>
> -  find_inferior (&all_threads, kill_one_lwp_callback , &pid);
> +  /* Keep call kill_one_lwp_callback until find_inferior cannot find any
> +     lwps that is for pid.  */
> +  while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);
>
>     /* See the comment in linux_kill_one_lwp.  We did not kill the first
>        thread in the list, so do so now.  */
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix crash of gdbserver when kill threads
  2014-06-29  3:28 ` Hui Zhu
@ 2014-07-02  9:07   ` Pedro Alves
  2014-07-10 15:17     ` [PATCH v2] GDBserver crashes when killing a multi-thread process Pedro Alves
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Alves @ 2014-07-02  9:07 UTC (permalink / raw)
  To: Hui Zhu, gdb-patches ml

On 06/29/2014 04:28 AM, Hui Zhu wrote:
> Hi Pedro,
> 
> This issue is introduced by the patches for PR 12702 that was pushed by you.
> Maybe you could take a look on this patch.

Thanks.  I'll need to think a bit about whether this is the right
solution.

I've added it to the 7.8 regression queue at:

 https://sourceware.org/gdb/wiki/GDB_7.8_Release

I'll look better at it as soon as I have a chance.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] GDBserver crashes when killing a multi-thread process
  2014-07-02  9:07   ` Pedro Alves
@ 2014-07-10 15:17     ` Pedro Alves
  2014-07-11  8:21       ` Hui Zhu
  2015-07-13 16:07       ` Yao Qi
  0 siblings, 2 replies; 10+ messages in thread
From: Pedro Alves @ 2014-07-10 15:17 UTC (permalink / raw)
  To: Hui Zhu, gdb-patches ml

Hi Hui,

On 07/02/2014 10:07 AM, Pedro Alves wrote:
>> This issue is introduced by the patches for PR 12702 that was pushed by you.
>> Maybe you could take a look on this patch.
> 
> Thanks.  I'll need to think a bit about whether this is the right
> solution.
> 
> I've added it to the 7.8 regression queue at:
> 
>  https://sourceware.org/gdb/wiki/GDB_7.8_Release
> 
> I'll look better at it as soon as I have a chance.
> 

First, it's really annoying that the testsuite doesn't catch this!
We're really missing such a basic test, so I went ahead and wrote one.
I was actually quite surprised that we didn't have any
test that makes sure "kill" works without complaints/spurious output.

However, unfortunately, the new test revealed that the fix isn't right.  :-/

Sometimes the test passed, but other times (that'll depend on
machine, of course), I'd get:

 kill
 Kill the program being debugged? (y or n) y
 Ignoring packet error, continuing...
 (gdb) FAIL: gdb.threads/kill.exp: kill

That means GDBserver didn't reply to the vKill packet.

Attaching to GDBserver at this point shows that it's stuck forever
in the new loop:


 +  /* Keep call kill_one_lwp_callback until find_inferior cannot find any
 +     lwps that is for pid.  */
 +  while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);


The reason was that the leader lwp goes zombie before the other
lwps are reaped, and linux_wait_for_event would delete it.  We'd be
left with e.g., only one lwp that is _not_ the leader.

That lwp had disappeared already (due to SIGKILL), but there's
no status to collect (waitpid would return -1).  Nothing in linux-low.c
would notice the lwp is already gone.  Because that lwp has lwp->stopped == 0,
linux_wait_for_event would return, here:

      if ((find_inferior (&all_threads,
			  not_stopped_callback,
			  &wait_ptid) == NULL))
	{
	  if (debug_threads)
	    debug_printf ("LLW: exit (no unwaited-for LWP)\n");
	  sigprocmask (SIG_SETMASK, &prev_mask, NULL);
	  return -1;
	}

So this would return true:

 --- a/gdb/gdbserver/linux-low.c
 +++ b/gdb/gdbserver/linux-low.c
 @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
       pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
     } while (pid > 0 && WIFSTOPPED (wstat));
 
 -  return 0;
 +  /* Let find_inferior return because maybe other lwp in the list will
 +     be deleted by delete_lwp.  */
 +  return 1;
  }

So the new loop would never break.

Here's a different patch that passes the test cleanly for me.
Let me know if it works for you.

8<-------------------
From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 10 Jul 2014 15:13:26 +0100
Subject: [PATCH] GDBserver crashes when killing a multi-thread process

Here's an example, with the new test:

 gdbserver :9999 gdb.threads/kill
 gdb gdb.threads/kill
 (gdb) b 52
 Breakpoint 1 at 0x4007f4: file kill.c, line 52.
 Continuing.

 Breakpoint 1, main () at kill.c:52
 52        return 0; /* set break here */
 (gdb) k
 Kill the program being debugged? (y or n) y

 gdbserver :9999 gdb.threads/kill
 Process gdb.base/watch_thread_num created; pid = 9719
 Listening on port 1234
 Remote debugging from host 127.0.0.1
 Killing all inferiors
 Segmentation fault (core dumped)

Backtrace:

 (gdb) bt
 #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
 #1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
 #2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
 #3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
 #4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
 #5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
 #6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
 #7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
 #8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
 #9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
 #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283

The problem is that linux_wait_for_event deletes lwps that have exited
(even those not passed in as lwps of interest), while the lwp/thread
list is being walked on with find_inferior.  find_inferior can handle
the current iterated inferior being deleted, but not others.

When killing lwps, we don't really care about any of the pending
status handling of linux_wait_for_event.  We can just waitpid the lwps
directly, which is also what GDB does (see
linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
while we're walking the list.  They'll be deleted by linux_mourn
afterwards.

This crash triggers several times when running the testsuite against
GDBserver with the native-gdbserver board (target remote), but as GDB
can't distinguish between GDBserver crashing and "kill" being
sucessful, as in both cases the connection is closed (the 'k' packet
doesn't require a reply), and the inferior is gone, that results in no
FAIL.

The patch adds a generic test that catches the issue with
extended-remote mode (and works fine with native testing too).  Here's
how it fails with the native-extended-gdbserver board without the fix:

 (gdb) info threads
   Id   Target Id         Frame
   6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
 (gdb) kill
 Kill the program being debugged? (y or n) y
 Remote connection closed
 ^^^^^^^^^^^^^^^^^^^^^^^^
 (gdb) FAIL: gdb.threads/kill.exp: kill

Extended remote should remain connected after the kill.

gdb/gdbserver/
2014-07-10  Pedro Alves  <palves@redhat.com>

	* linux-low.c (kill_wait_lwp): New function, based on
	kill_one_lwp_callback, but use my_waitpid directly.
	(kill_one_lwp_callback, linux_kill): Use it.

gdb/testsuite/
2014-07-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/kill.c: New file.
	* gdb.threads/kill.exp: New file.
---
 gdb/gdbserver/linux-low.c          | 68 ++++++++++++++++++++-------------
 gdb/testsuite/gdb.threads/kill.c   | 64 +++++++++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 26 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/kill.c
 create mode 100644 gdb/testsuite/gdb.threads/kill.exp

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 61552f4..bdc9aaa 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp)
 		  errno ? strerror (errno) : "OK");
 }
 
+/* Kill LWP and wait for it to die.  */
+
+static void
+kill_wait_lwp (struct lwp_info *lwp)
+{
+  struct thread_info *thr = get_lwp_thread (lwp);
+  int pid = ptid_get_pid (ptid_of (thr));
+  int lwpid = ptid_get_lwp (ptid_of (thr));
+  int wstat;
+  int res;
+
+  if (debug_threads)
+    debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
+
+  do
+    {
+      linux_kill_one_lwp (lwp);
+
+      /* Make sure it died.  Notes:
+
+	 - The loop is most likely unnecessary.
+
+         - We don't use linux_wait_for_event as that could delete lwps
+           while we're iterating over them.  We're not interested in
+           any pending status at this point, only in making sure all
+           wait status on the kernel side are collected until the
+           process is reaped.
+
+	 - We don't use __WALL here as the __WALL emulation relies on
+	   SIGCHLD, and killing a stopped process doesn't generate
+	   one, nor an exit status.
+      */
+      res = my_waitpid (lwpid, &wstat, 0);
+      if (res == -1 && errno == ECHILD)
+	res = my_waitpid (lwpid, &wstat, __WCLONE);
+    } while (res > 0 && WIFSTOPPED (wstat));
+
+  gdb_assert (res > 0);
+}
+
 /* Callback for `find_inferior'.  Kills an lwp of a given process,
    except the leader.  */
 
@@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
 {
   struct thread_info *thread = (struct thread_info *) entry;
   struct lwp_info *lwp = get_thread_lwp (thread);
-  int wstat;
   int pid = * (int *) args;
 
   if (ptid_get_pid (entry->id) != pid)
@@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
       return 0;
     }
 
-  do
-    {
-      linux_kill_one_lwp (lwp);
-
-      /* Make sure it died.  The loop is most likely unnecessary.  */
-      pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
-    } while (pid > 0 && WIFSTOPPED (wstat));
-
+  kill_wait_lwp (lwp);
   return 0;
 }
 
@@ -952,8 +984,6 @@ linux_kill (int pid)
 {
   struct process_info *process;
   struct lwp_info *lwp;
-  int wstat;
-  int lwpid;
 
   process = find_process_pid (pid);
   if (process == NULL)
@@ -976,21 +1006,7 @@ linux_kill (int pid)
 		      pid);
     }
   else
-    {
-      struct thread_info *thr = get_lwp_thread (lwp);
-
-      if (debug_threads)
-	debug_printf ("lk_1: killing lwp %ld, for pid: %d\n",
-		      lwpid_of (thr), pid);
-
-      do
-	{
-	  linux_kill_one_lwp (lwp);
-
-	  /* Make sure it died.  The loop is most likely unnecessary.  */
-	  lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL);
-	} while (lwpid > 0 && WIFSTOPPED (wstat));
-    }
+    kill_wait_lwp (lwp);
 
   the_target->mourn (process);
 
diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c
new file mode 100644
index 0000000..aefbb06
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/kill.c
@@ -0,0 +1,64 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2014 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifdef USE_THREADS
+
+#include <unistd.h>
+#include <pthread.h>
+
+#define NUM 5
+
+pthread_barrier_t barrier;
+
+void *
+thread_function (void *arg)
+{
+  volatile unsigned int counter = 1;
+
+  pthread_barrier_wait (&barrier);
+
+  while (counter > 0)
+    {
+      counter++;
+      usleep (1);
+    }
+
+  pthread_exit (NULL);
+}
+
+#endif /* USE_THREADS */
+
+void
+setup (void)
+{
+#ifdef USE_THREADS
+  pthread_t threads[NUM];
+  int i;
+
+  pthread_barrier_init (&barrier, NULL, NUM + 1);
+  for (i = 0; i < NUM; i++)
+    pthread_create (&threads[i], NULL, thread_function, NULL);
+  pthread_barrier_wait (&barrier);
+#endif /* USE_THREADS */
+}
+
+int
+main (void)
+{
+  setup ();
+  return 0; /* set break here */
+}
diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp
new file mode 100644
index 0000000..a3f921f
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/kill.exp
@@ -0,0 +1,77 @@
+# This testcase is part of GDB, the GNU debugger.
+
+# Copyright 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+standard_testfile
+
+# Run the test proper.  THREADED indicates whether to build a threaded
+# program and spawn several threads before trying to kill the program.
+
+proc test {threaded} {
+    global testfile srcfile
+
+    with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] {
+
+	set options {debug}
+	if {$threaded} {
+	    lappend options "pthreads"
+	    lappend options "additional_flags=-DUSE_THREADS"
+	    set prog ${testfile}_threads
+	} else {
+	    set prog ${testfile}_nothreads
+	}
+
+	if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} {
+	    return -1
+	}
+
+	if { ![runto main] } then {
+	    fail "run to main"
+	    return
+	}
+
+	set linenum [gdb_get_line_number "set break here"]
+	gdb_breakpoint "$srcfile:$linenum"
+	gdb_continue_to_breakpoint "break here" ".*break here.*"
+
+	if {$threaded} {
+	    gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started"
+	}
+
+	# This kills and ensures no output other than the prompt comes out,
+	# like:
+	#
+	#  (gdb) kill
+	#  Kill the program being debugged? (y or n) y
+	#  (gdb)
+	#
+	# If we instead saw more output, like e.g., with an extended-remote
+	# connection:
+	#
+	#  (gdb) kill
+	#  Kill the program being debugged? (y or n) y
+	#  Remote connection closed
+	#  (gdb)
+	#
+	# the above would mean that the remote end crashed.
+
+	gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y"
+    }
+}
+
+foreach threaded {true false} {
+    test $threaded
+}
-- 
1.9.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
  2014-07-10 15:17     ` [PATCH v2] GDBserver crashes when killing a multi-thread process Pedro Alves
@ 2014-07-11  8:21       ` Hui Zhu
  2014-07-11 10:53         ` [PUSHED+7.8] " Pedro Alves
  2015-07-13 16:07       ` Yao Qi
  1 sibling, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2014-07-11  8:21 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches ml

Hi Pedro,

Thanks for your patch.  It fixed the issue in my part.

Best,
Hui

On 07/10/14 23:16, Pedro Alves wrote:
> Hi Hui,
>
> On 07/02/2014 10:07 AM, Pedro Alves wrote:
>>> This issue is introduced by the patches for PR 12702 that was pushed by you.
>>> Maybe you could take a look on this patch.
>>
>> Thanks.  I'll need to think a bit about whether this is the right
>> solution.
>>
>> I've added it to the 7.8 regression queue at:
>>
>>   https://sourceware.org/gdb/wiki/GDB_7.8_Release
>>
>> I'll look better at it as soon as I have a chance.
>>
>
> First, it's really annoying that the testsuite doesn't catch this!
> We're really missing such a basic test, so I went ahead and wrote one.
> I was actually quite surprised that we didn't have any
> test that makes sure "kill" works without complaints/spurious output.
>
> However, unfortunately, the new test revealed that the fix isn't right.  :-/
>
> Sometimes the test passed, but other times (that'll depend on
> machine, of course), I'd get:
>
>   kill
>   Kill the program being debugged? (y or n) y
>   Ignoring packet error, continuing...
>   (gdb) FAIL: gdb.threads/kill.exp: kill
>
> That means GDBserver didn't reply to the vKill packet.
>
> Attaching to GDBserver at this point shows that it's stuck forever
> in the new loop:
>
>
>   +  /* Keep call kill_one_lwp_callback until find_inferior cannot find any
>   +     lwps that is for pid.  */
>   +  while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);
>
>
> The reason was that the leader lwp goes zombie before the other
> lwps are reaped, and linux_wait_for_event would delete it.  We'd be
> left with e.g., only one lwp that is _not_ the leader.
>
> That lwp had disappeared already (due to SIGKILL), but there's
> no status to collect (waitpid would return -1).  Nothing in linux-low.c
> would notice the lwp is already gone.  Because that lwp has lwp->stopped == 0,
> linux_wait_for_event would return, here:
>
>        if ((find_inferior (&all_threads,
> 			  not_stopped_callback,
> 			  &wait_ptid) == NULL))
> 	{
> 	  if (debug_threads)
> 	    debug_printf ("LLW: exit (no unwaited-for LWP)\n");
> 	  sigprocmask (SIG_SETMASK, &prev_mask, NULL);
> 	  return -1;
> 	}
>
> So this would return true:
>
>   --- a/gdb/gdbserver/linux-low.c
>   +++ b/gdb/gdbserver/linux-low.c
>   @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
>         pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
>       } while (pid > 0 && WIFSTOPPED (wstat));
>
>   -  return 0;
>   +  /* Let find_inferior return because maybe other lwp in the list will
>   +     be deleted by delete_lwp.  */
>   +  return 1;
>    }
>
> So the new loop would never break.
>
> Here's a different patch that passes the test cleanly for me.
> Let me know if it works for you.
>
> 8<-------------------
>  From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 10 Jul 2014 15:13:26 +0100
> Subject: [PATCH] GDBserver crashes when killing a multi-thread process
>
> Here's an example, with the new test:
>
>   gdbserver :9999 gdb.threads/kill
>   gdb gdb.threads/kill
>   (gdb) b 52
>   Breakpoint 1 at 0x4007f4: file kill.c, line 52.
>   Continuing.
>
>   Breakpoint 1, main () at kill.c:52
>   52        return 0; /* set break here */
>   (gdb) k
>   Kill the program being debugged? (y or n) y
>
>   gdbserver :9999 gdb.threads/kill
>   Process gdb.base/watch_thread_num created; pid = 9719
>   Listening on port 1234
>   Remote debugging from host 127.0.0.1
>   Killing all inferiors
>   Segmentation fault (core dumped)
>
> Backtrace:
>
>   (gdb) bt
>   #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
>   #1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
>   #2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
>   #3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
>   #4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
>   #5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
>   #6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
>   #7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
>   #8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
>   #9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
>   #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283
>
> The problem is that linux_wait_for_event deletes lwps that have exited
> (even those not passed in as lwps of interest), while the lwp/thread
> list is being walked on with find_inferior.  find_inferior can handle
> the current iterated inferior being deleted, but not others.
>
> When killing lwps, we don't really care about any of the pending
> status handling of linux_wait_for_event.  We can just waitpid the lwps
> directly, which is also what GDB does (see
> linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
> while we're walking the list.  They'll be deleted by linux_mourn
> afterwards.
>
> This crash triggers several times when running the testsuite against
> GDBserver with the native-gdbserver board (target remote), but as GDB
> can't distinguish between GDBserver crashing and "kill" being
> sucessful, as in both cases the connection is closed (the 'k' packet
> doesn't require a reply), and the inferior is gone, that results in no
> FAIL.
>
> The patch adds a generic test that catches the issue with
> extended-remote mode (and works fine with native testing too).  Here's
> how it fails with the native-extended-gdbserver board without the fix:
>
>   (gdb) info threads
>     Id   Target Id         Frame
>     6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>   * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
>   (gdb) kill
>   Kill the program being debugged? (y or n) y
>   Remote connection closed
>   ^^^^^^^^^^^^^^^^^^^^^^^^
>   (gdb) FAIL: gdb.threads/kill.exp: kill
>
> Extended remote should remain connected after the kill.
>
> gdb/gdbserver/
> 2014-07-10  Pedro Alves  <palves@redhat.com>
>
> 	* linux-low.c (kill_wait_lwp): New function, based on
> 	kill_one_lwp_callback, but use my_waitpid directly.
> 	(kill_one_lwp_callback, linux_kill): Use it.
>
> gdb/testsuite/
> 2014-07-10  Pedro Alves  <palves@redhat.com>
>
> 	* gdb.threads/kill.c: New file.
> 	* gdb.threads/kill.exp: New file.
> ---
>   gdb/gdbserver/linux-low.c          | 68 ++++++++++++++++++++-------------
>   gdb/testsuite/gdb.threads/kill.c   | 64 +++++++++++++++++++++++++++++++
>   gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 183 insertions(+), 26 deletions(-)
>   create mode 100644 gdb/testsuite/gdb.threads/kill.c
>   create mode 100644 gdb/testsuite/gdb.threads/kill.exp
>
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 61552f4..bdc9aaa 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp)
>   		  errno ? strerror (errno) : "OK");
>   }
>
> +/* Kill LWP and wait for it to die.  */
> +
> +static void
> +kill_wait_lwp (struct lwp_info *lwp)
> +{
> +  struct thread_info *thr = get_lwp_thread (lwp);
> +  int pid = ptid_get_pid (ptid_of (thr));
> +  int lwpid = ptid_get_lwp (ptid_of (thr));
> +  int wstat;
> +  int res;
> +
> +  if (debug_threads)
> +    debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
> +
> +  do
> +    {
> +      linux_kill_one_lwp (lwp);
> +
> +      /* Make sure it died.  Notes:
> +
> +	 - The loop is most likely unnecessary.
> +
> +         - We don't use linux_wait_for_event as that could delete lwps
> +           while we're iterating over them.  We're not interested in
> +           any pending status at this point, only in making sure all
> +           wait status on the kernel side are collected until the
> +           process is reaped.
> +
> +	 - We don't use __WALL here as the __WALL emulation relies on
> +	   SIGCHLD, and killing a stopped process doesn't generate
> +	   one, nor an exit status.
> +      */
> +      res = my_waitpid (lwpid, &wstat, 0);
> +      if (res == -1 && errno == ECHILD)
> +	res = my_waitpid (lwpid, &wstat, __WCLONE);
> +    } while (res > 0 && WIFSTOPPED (wstat));
> +
> +  gdb_assert (res > 0);
> +}
> +
>   /* Callback for `find_inferior'.  Kills an lwp of a given process,
>      except the leader.  */
>
> @@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
>   {
>     struct thread_info *thread = (struct thread_info *) entry;
>     struct lwp_info *lwp = get_thread_lwp (thread);
> -  int wstat;
>     int pid = * (int *) args;
>
>     if (ptid_get_pid (entry->id) != pid)
> @@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
>         return 0;
>       }
>
> -  do
> -    {
> -      linux_kill_one_lwp (lwp);
> -
> -      /* Make sure it died.  The loop is most likely unnecessary.  */
> -      pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
> -    } while (pid > 0 && WIFSTOPPED (wstat));
> -
> +  kill_wait_lwp (lwp);
>     return 0;
>   }
>
> @@ -952,8 +984,6 @@ linux_kill (int pid)
>   {
>     struct process_info *process;
>     struct lwp_info *lwp;
> -  int wstat;
> -  int lwpid;
>
>     process = find_process_pid (pid);
>     if (process == NULL)
> @@ -976,21 +1006,7 @@ linux_kill (int pid)
>   		      pid);
>       }
>     else
> -    {
> -      struct thread_info *thr = get_lwp_thread (lwp);
> -
> -      if (debug_threads)
> -	debug_printf ("lk_1: killing lwp %ld, for pid: %d\n",
> -		      lwpid_of (thr), pid);
> -
> -      do
> -	{
> -	  linux_kill_one_lwp (lwp);
> -
> -	  /* Make sure it died.  The loop is most likely unnecessary.  */
> -	  lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL);
> -	} while (lwpid > 0 && WIFSTOPPED (wstat));
> -    }
> +    kill_wait_lwp (lwp);
>
>     the_target->mourn (process);
>
> diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c
> new file mode 100644
> index 0000000..aefbb06
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/kill.c
> @@ -0,0 +1,64 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2014 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef USE_THREADS
> +
> +#include <unistd.h>
> +#include <pthread.h>
> +
> +#define NUM 5
> +
> +pthread_barrier_t barrier;
> +
> +void *
> +thread_function (void *arg)
> +{
> +  volatile unsigned int counter = 1;
> +
> +  pthread_barrier_wait (&barrier);
> +
> +  while (counter > 0)
> +    {
> +      counter++;
> +      usleep (1);
> +    }
> +
> +  pthread_exit (NULL);
> +}
> +
> +#endif /* USE_THREADS */
> +
> +void
> +setup (void)
> +{
> +#ifdef USE_THREADS
> +  pthread_t threads[NUM];
> +  int i;
> +
> +  pthread_barrier_init (&barrier, NULL, NUM + 1);
> +  for (i = 0; i < NUM; i++)
> +    pthread_create (&threads[i], NULL, thread_function, NULL);
> +  pthread_barrier_wait (&barrier);
> +#endif /* USE_THREADS */
> +}
> +
> +int
> +main (void)
> +{
> +  setup ();
> +  return 0; /* set break here */
> +}
> diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp
> new file mode 100644
> index 0000000..a3f921f
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/kill.exp
> @@ -0,0 +1,77 @@
> +# This testcase is part of GDB, the GNU debugger.
> +
> +# Copyright 2014 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +standard_testfile
> +
> +# Run the test proper.  THREADED indicates whether to build a threaded
> +# program and spawn several threads before trying to kill the program.
> +
> +proc test {threaded} {
> +    global testfile srcfile
> +
> +    with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] {
> +
> +	set options {debug}
> +	if {$threaded} {
> +	    lappend options "pthreads"
> +	    lappend options "additional_flags=-DUSE_THREADS"
> +	    set prog ${testfile}_threads
> +	} else {
> +	    set prog ${testfile}_nothreads
> +	}
> +
> +	if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} {
> +	    return -1
> +	}
> +
> +	if { ![runto main] } then {
> +	    fail "run to main"
> +	    return
> +	}
> +
> +	set linenum [gdb_get_line_number "set break here"]
> +	gdb_breakpoint "$srcfile:$linenum"
> +	gdb_continue_to_breakpoint "break here" ".*break here.*"
> +
> +	if {$threaded} {
> +	    gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started"
> +	}
> +
> +	# This kills and ensures no output other than the prompt comes out,
> +	# like:
> +	#
> +	#  (gdb) kill
> +	#  Kill the program being debugged? (y or n) y
> +	#  (gdb)
> +	#
> +	# If we instead saw more output, like e.g., with an extended-remote
> +	# connection:
> +	#
> +	#  (gdb) kill
> +	#  Kill the program being debugged? (y or n) y
> +	#  Remote connection closed
> +	#  (gdb)
> +	#
> +	# the above would mean that the remote end crashed.
> +
> +	gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y"
> +    }
> +}
> +
> +foreach threaded {true false} {
> +    test $threaded
> +}
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PUSHED+7.8] Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
  2014-07-11  8:21       ` Hui Zhu
@ 2014-07-11 10:53         ` Pedro Alves
  0 siblings, 0 replies; 10+ messages in thread
From: Pedro Alves @ 2014-07-11 10:53 UTC (permalink / raw)
  To: Hui Zhu, gdb-patches ml

On 07/11/2014 08:33 AM, Hui Zhu wrote:
> Hi Pedro,
> 
> Thanks for your patch.  It fixed the issue in my part.

Thank you.  I went ahead and pushed it to both master and gdb-7.8-branch.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
  2014-07-10 15:17     ` [PATCH v2] GDBserver crashes when killing a multi-thread process Pedro Alves
  2014-07-11  8:21       ` Hui Zhu
@ 2015-07-13 16:07       ` Yao Qi
  2015-07-13 17:32         ` Pedro Alves
  1 sibling, 1 reply; 10+ messages in thread
From: Yao Qi @ 2015-07-13 16:07 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches ml



On 10/07/14 16:16, Pedro Alves wrote:
> +static void
> +kill_wait_lwp (struct lwp_info *lwp)
> +{
> +  struct thread_info *thr = get_lwp_thread (lwp);
> +  int pid = ptid_get_pid (ptid_of (thr));
> +  int lwpid = ptid_get_lwp (ptid_of (thr));
> +  int wstat;
> +  int res;
> +
> +  if (debug_threads)
> +    debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
> +
> +  do
> +    {
> +      linux_kill_one_lwp (lwp);
> +
> +      /* Make sure it died.  Notes:
> +
> +	 - The loop is most likely unnecessary.
> +
> +         - We don't use linux_wait_for_event as that could delete lwps
> +           while we're iterating over them.  We're not interested in
> +           any pending status at this point, only in making sure all
> +           wait status on the kernel side are collected until the
> +           process is reaped.
> +
> +	 - We don't use __WALL here as the __WALL emulation relies on
> +	   SIGCHLD, and killing a stopped process doesn't generate
> +	   one, nor an exit status.
> +      */
> +      res = my_waitpid (lwpid, &wstat, 0);
> +      if (res == -1 && errno == ECHILD)
> +	res = my_waitpid (lwpid, &wstat, __WCLONE);
> +    } while (res > 0 && WIFSTOPPED (wstat));
> +
> +  gdb_assert (res > 0);
> +}

Hi Pedro,
do you still remember why did you add this assert?  It wasn't
mentioned in the mail 
https://sourceware.org/ml/gdb-patches/2014-07/msg00206.html

I am looking at a GDBserver internal error on x86_64 when I run
gdb.threads/thread-unwindonsignal.exp with GDBserver,

continue^M
Continuing.^M
warning: Remote failure reply: E.No unwaited-for children left.^M
PC register is not available^M
(gdb) FAIL: gdb.threads/thread-unwindonsignal.exp: continue until exit
Remote debugging from host 127.0.0.1^M
ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
monitor exit^M
Killing process(es): 30694^M
(gdb) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1106: A 
problem internal to GDBserver has been detected.^M
kill_wait_lwp: Assertion `res > 0' failed.

After your patch https://sourceware.org/ml/gdb-patches/2015-03/msg00597.html
GDBserver starts to swallows errors if the LWP is gone.  Then, when
GDBservers kills non-exist LWP, the assert will be triggered.

Why don't we implement kill_wait_lwp like its counterpart in GDB
linux-nat.c:kill_wait_callback? we can loop and assert like this
patch below, (note that this patch fixes the internal error, and
the FAIL is still there).

-- 
Yao (齐尧)

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 7bb9f7f..07d051a 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -1101,9 +1101,9 @@ kill_wait_lwp (struct lwp_info *lwp)
        res = my_waitpid (lwpid, &wstat, 0);
        if (res == -1 && errno == ECHILD)
         res = my_waitpid (lwpid, &wstat, __WCLONE);
-    } while (res > 0 && WIFSTOPPED (wstat));
+    } while (res == lwpid);

-  gdb_assert (res > 0);
+  gdb_assert (res == -1 && errno == ECHILD);
  }

  /* Callback for `find_inferior'.  Kills an lwp of a given process,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
  2015-07-13 16:07       ` Yao Qi
@ 2015-07-13 17:32         ` Pedro Alves
  2015-07-14  8:00           ` Yao Qi
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Alves @ 2015-07-13 17:32 UTC (permalink / raw)
  To: Yao Qi, gdb-patches ml

On 07/13/2015 05:07 PM, Yao Qi wrote:

> Hi Pedro,
> do you still remember why did you add this assert?  It wasn't
> mentioned in the mail 
> https://sourceware.org/ml/gdb-patches/2014-07/msg00206.html
> 

Simply because getting here was supposed to indicate
something went wrong elsewhere, but at the time I didn't consider
that the child could die while ptrace-stopped.

> I am looking at a GDBserver internal error on x86_64 when I run
> gdb.threads/thread-unwindonsignal.exp with GDBserver,
> 
> continue^M
> Continuing.^M
> warning: Remote failure reply: E.No unwaited-for children left.^M
> PC register is not available^M
> (gdb) FAIL: gdb.threads/thread-unwindonsignal.exp: continue until exit
> Remote debugging from host 127.0.0.1^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> monitor exit^M
> Killing process(es): 30694^M
> (gdb) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1106: A 
> problem internal to GDBserver has been detected.^M
> kill_wait_lwp: Assertion `res > 0' failed.
> 
> After your patch https://sourceware.org/ml/gdb-patches/2015-03/msg00597.html

> GDBserver starts to swallows errors if the LWP is gone.  Then, when
> GDBservers kills non-exist LWP, the assert will be triggered.
> 

Looks like I forgot to push the rest of that series:

 https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html

What do you think of that one?

> Why don't we implement kill_wait_lwp like its counterpart in GDB
> linux-nat.c:kill_wait_callback? we can loop and assert like this
> patch below, (note that this patch fixes the internal error, and
> the FAIL is still there).
> 

Seems to me it's not 100% correct to waitpid the pid one more time
after we've already reaped it, because there's a minuscule chance
another process that we're debugging could clone a new lwp that reuses
the PID of the one we've just killed/reaped, and then another iteration
could collect the initial SIGSTOP of the wrong LWP and we'd kill it:

-> kill (pid1, SIGKILL);
<- waitpid (pid1) returns pid1/WSIGNALLED
-> on iteration1: new pid1 clone lwp is spawned
-> ret==pid1, continue iterating
-> kill (pid1, SIGKILL); // killing wrong process
<- waitpid (pid1) returns either SIGSTOP or WSIGNALLED
...

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
  2015-07-13 17:32         ` Pedro Alves
@ 2015-07-14  8:00           ` Yao Qi
  2015-07-14  9:13             ` Pedro Alves
  0 siblings, 1 reply; 10+ messages in thread
From: Yao Qi @ 2015-07-14  8:00 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Yao Qi, gdb-patches ml

Pedro Alves <palves@redhat.com> writes:

> Looks like I forgot to push the rest of that series:
>
>  https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html
>
> What do you think of that one?

Yes, it looks good to me.  We also need it on 7.10 branch.

>
>> Why don't we implement kill_wait_lwp like its counterpart in GDB
>> linux-nat.c:kill_wait_callback? we can loop and assert like this
>> patch below, (note that this patch fixes the internal error, and
>> the FAIL is still there).
>> 
>
> Seems to me it's not 100% correct to waitpid the pid one more time
> after we've already reaped it, because there's a minuscule chance
> another process that we're debugging could clone a new lwp that reuses
> the PID of the one we've just killed/reaped, and then another iteration
> could collect the initial SIGSTOP of the wrong LWP and we'd kill it:
>
> -> kill (pid1, SIGKILL);
> <- waitpid (pid1) returns pid1/WSIGNALLED
> -> on iteration1: new pid1 clone lwp is spawned
> -> ret==pid1, continue iterating
> -> kill (pid1, SIGKILL); // killing wrong process
> <- waitpid (pid1) returns either SIGSTOP or WSIGNALLED
> ...

Yes, that is possible.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
  2015-07-14  8:00           ` Yao Qi
@ 2015-07-14  9:13             ` Pedro Alves
  0 siblings, 0 replies; 10+ messages in thread
From: Pedro Alves @ 2015-07-14  9:13 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches ml

On 07/14/2015 09:00 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> Looks like I forgot to push the rest of that series:
>>
>>  https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html
>>
>> What do you think of that one?
> 
> Yes, it looks good to me.  We also need it on 7.10 branch.

OK, I'll push it.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-14  9:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-23 10:10 [PATCH] Fix crash of gdbserver when kill threads Hui Zhu
2014-06-29  3:28 ` Hui Zhu
2014-07-02  9:07   ` Pedro Alves
2014-07-10 15:17     ` [PATCH v2] GDBserver crashes when killing a multi-thread process Pedro Alves
2014-07-11  8:21       ` Hui Zhu
2014-07-11 10:53         ` [PUSHED+7.8] " Pedro Alves
2015-07-13 16:07       ` Yao Qi
2015-07-13 17:32         ` Pedro Alves
2015-07-14  8:00           ` Yao Qi
2015-07-14  9:13             ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).