public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/5] GNU/Linux, fix attach races/problems
@ 2014-12-16 16:54 Pedro Alves
  2014-12-16 16:54 ` [PATCH 3/5] libthread_db: Skip attaching to terminated and joined threads Pedro Alves
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Pedro Alves @ 2014-12-16 16:54 UTC (permalink / raw)
  To: gdb-patches

Investigating an "attach" bug seemingly caused by a race, related to
the fact that on Linux we have to attach to each thread individually,
I decided to write a test that stresses that aspect of attach.  The
test constantly spawns short-lived threads, and has GDB attach, debug
a little, detach, attach, rinse/repeat a few times.

That actually exposed a set of issues, both in GDB and in
glibc/libthread_db.

One is that the test defeats the current heuristics in place: we still
fail to attach to some threads sometimes, if the program spawns them
quickly enough.  This is fixed by fetching the list of LWPs to attach
to from /proc instead of relying on libthread_db for that.

Another is that some times we'd try to attach to a bogus lwp with
PID==-1, and do a dangerous waitpid, potentially eating an event by
mistake and breaking the debug session as result.

Yet another is a nasty libthread_db event reporting mechanism race
related to detaching from the inferior just while a thread is
reporting an event, resulting in a subsequent attach session tripping
on broken libthread_db events.  We work around this by no longer using
libthread_db for thread event creation/death reporting, which is good
on its own, for being much more robust and efficient.

I've filed a couple glibc bugs as result of all this:

 Bug 17705 - nptl_db: stale thread create/death events if debugger detaches
 https://sourceware.org/bugzilla/show_bug.cgi?id=17705

 Bug 17707 - nptl_db: terminated and joined threads
 https://sourceware.org/bugzilla/show_bug.cgi?id=17707

The series fixes the GDB issues and at the same time works around the
glibc issues.

Tested on x86_64 Fedora 20, native and gdbserver.

Comments?

Pedro Alves (5):
  libthread_db: debug output should go to gdb_stdlog
  Linux: on attach, attach to lwps listed under /proc/$pid/task/
  libthread_db: Skip attaching to terminated and joined threads
  Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is
    supported
  Test attaching to a program that constantly spawns short-lived threads

 gdb/gdbserver/linux-low.c                          | 152 ++++++++------------
 gdb/gdbserver/linux-low.h                          |   6 -
 gdb/gdbserver/thread-db.c                          |  13 +-
 gdb/linux-nat.c                                    |  94 +++++++++++++
 gdb/linux-nat.h                                    |   4 +
 gdb/linux-thread-db.c                              |  84 ++++++++---
 gdb/nat/linux-procfs.c                             | 153 ++++++++++++++++++---
 gdb/nat/linux-procfs.h                             |  32 ++++-
 gdb/nat/linux-ptrace.c                             |  33 ++++-
 gdb/nat/linux-ptrace.h                             |   8 ++
 .../gdb.threads/attach-many-short-lived-threads.c  | 117 ++++++++++++++++
 .../attach-many-short-lived-threads.exp            | 135 ++++++++++++++++++
 gdb/testsuite/gdb.threads/fork-thread-pending.exp  |   2 +-
 .../signal-command-multiple-signals-pending.c      |  11 +-
 .../signal-command-multiple-signals-pending.exp    |   7 +
 15 files changed, 698 insertions(+), 153 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
 create mode 100644 gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp

-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/5] libthread_db: Skip attaching to terminated and joined threads
  2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
@ 2014-12-16 16:54 ` Pedro Alves
  2014-12-16 16:54 ` [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog Pedro Alves
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Pedro Alves @ 2014-12-16 16:54 UTC (permalink / raw)
  To: gdb-patches

I wrote a test that attaches to a program that constantly spawns
short-lived threads, which exposed several issues.  This is one of
them.

On GNU/Linux, attaching to a multi-threaded program sometimes prints
out warnings like:

 ...
 [New LWP 20700]
 warning: unable to open /proc file '/proc/-1/status'
 [New LWP 20850]
 [New LWP 21019]
 ...

That happens because when a thread exits, and is joined, glibc does:

nptl/pthread_join.c:
pthread_join ()
{
...
  if (__glibc_likely (result == 0))
    {
      /* We mark the thread as terminated and as joined.  */
      pd->tid = -1;
...
     /* Free the TCB.  */
      __free_tcb (pd);
    }

So if we attach or interrupt the program (which does an implicit "info
threads") at just the right (or rather, wrong) time, we can find and
return threads in the libthread_db/pthreads thread list with kernel
thread ID -1.  I've filed glibc PR nptl/17707 for this.  You'll find
more info there.

This patch handles this as a special case in GDB.

This is actually more than just a cosmetic issue.  lin_lwp_attach_lwp
will think that this -1 is an LWP we're not attached to yet, and after
failing to attach will try to check we were already attached to the
process, using a waitpid call, which in this case ends up being
"waitpid (-1, ...", which obviously results in GDB potentially
discarding an event when it shouldn't...

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/gdbserver/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* thread-db.c (find_new_threads_callback): Ignore thread if the
	kernel thread ID is -1.

gdb/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (lin_lwp_attach_lwp): Assert that the lwp id we're
	about to wait for is > 0.
	* linux-thread-db.c (find_new_threads_callback): Ignore thread if
	the kernel thread ID is -1.
---
 gdb/gdbserver/thread-db.c | 11 +++++++++++
 gdb/linux-nat.c           |  1 +
 gdb/linux-thread-db.c     | 11 +++++++++++
 3 files changed, 23 insertions(+)

diff --git a/gdb/gdbserver/thread-db.c b/gdb/gdbserver/thread-db.c
index ac94892..2d9980d 100644
--- a/gdb/gdbserver/thread-db.c
+++ b/gdb/gdbserver/thread-db.c
@@ -396,6 +396,17 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
   if (err != TD_OK)
     error ("Cannot get thread info: %s", thread_db_err_str (err));
 
+  if (ti.ti_lid == -1)
+    {
+      /* A thread with kernel thread ID -1 is either a thread that
+	 exited and was joined, or a thread that is being created but
+	 hasn't started yet, and that is reusing the tcb/stack of a
+	 thread that previously exited and was joined.  (glibc marks
+	 terminated and joined threads with kernel thread ID -1.  See
+	 glibc PR17707.  */
+      return 0;
+    }
+
   /* Check for zombies.  */
   if (ti.ti_state == TD_THR_UNKNOWN || ti.ti_state == TD_THR_ZOMBIE)
     return 0;
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index c6b5280..828064f 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1023,6 +1023,7 @@ lin_lwp_attach_lwp (ptid_t ptid)
 
 		  /* See if we've got a stop for this new child
 		     pending.  If so, we're already attached.  */
+		  gdb_assert (lwpid > 0);
 		  new_pid = my_waitpid (lwpid, &status, WNOHANG);
 		  if (new_pid == -1 && errno == ECHILD)
 		    new_pid = my_waitpid (lwpid, &status, __WCLONE | WNOHANG);
diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index a405603..4b26984 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -1606,6 +1606,17 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
     error (_("find_new_threads_callback: cannot get thread info: %s"),
 	   thread_db_err_str (err));
 
+  if (ti.ti_lid == -1)
+    {
+      /* A thread with kernel thread ID -1 is either a thread that
+	 exited and was joined, or a thread that is being created but
+	 hasn't started yet, and that is reusing the tcb/stack of a
+	 thread that previously exited and was joined.  (glibc marks
+	 terminated and joined threads with kernel thread ID -1.  See
+	 glibc PR17707.  */
+      return 0;
+    }
+
   if (ti.ti_tid == 0)
     {
       /* A thread ID of zero means that this is the main thread, but
-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported
  2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
  2014-12-16 16:54 ` [PATCH 3/5] libthread_db: Skip attaching to terminated and joined threads Pedro Alves
  2014-12-16 16:54 ` [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog Pedro Alves
@ 2014-12-16 16:54 ` Pedro Alves
  2014-12-16 21:24   ` Simon Marchi
  2014-12-16 16:54 ` [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/ Pedro Alves
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2014-12-16 16:54 UTC (permalink / raw)
  To: gdb-patches

[A test I wrote stumbled on a libthread_db issue related to thread
event breakpoints.  See glibc PR17705:
 [nptl_db: stale thread create/death events if debugger detaches]
 https://sourceware.org/bugzilla/show_bug.cgi?id=17705

This patch avoids that whole issue by making GDB stop using thread
event breakpoints in the first place, which is good for other reasons
as well, anyway.]

Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new
threads in the inferior (to attach to them) or to learn about thread
exit was to coordinate with the inferior's glibc/runtime, using
libthread_db.  That works by putting a breakpoint at a magic address
which is called when a new thread is spawned, or when a thread is
about to exit.  When that breakpoint is hit, all threads are stopped,
and then GDB coordinates with libthread_db to read data structures out
of the inferior to learn about what happened.  Then the breakpoint is
single-stepped, and then all threads are re-resumed.  This isn't very
efficient (stops all threads) and is more fragile (inferior's thread
list in memory may be corrupt; libthread_db bugs, etc.) than ideal.

When the kernel supports PTRACE_EVENT_CLONE (which we already make use
of), there's really no need to use libthread_db's event reporting
mechanism to learn about new LWPs.  And if the kernel supports that,
then we learn about LWP exits through regular WIFEXITED wait statuses,
so no need for the death event breakpoint either.

GDBserver has been likewise skipping the thread_db events for a long
while:
  https://sourceware.org/ml/gdb-patches/2007-10/msg00547.html

There's one user-visible difference: we'll no longer print about
threads being created and exiting while the program is running, like:

 [Thread 0x7ffff7dbb700 (LWP 30670) exited]
 [New Thread 0x7ffff7db3700 (LWP 30671)]
 [Thread 0x7ffff7dd3700 (LWP 30667) exited]
 [New Thread 0x7ffff7dab700 (LWP 30672)]
 [Thread 0x7ffff7db3700 (LWP 30671) exited]
 [Thread 0x7ffff7dcb700 (LWP 30668) exited]

This is exactly the same behavior as when debugging against remote
targets / gdbserver.  I actually think that's a good thing (and as
such have listed this in the local/remote parity wiki page a while
ago), as the printing slows down the inferior.  It's also a
distraction to keep bothering the user about short-lived threads that
she won't be able to interact with anyway.  Instead, the user (and
frontend) will be informed about new threads that currently exist in
the program when the program next stops:

 (gdb) c
 ...
 * ctrl-c *
 [New Thread 0x7ffff7963700 (LWP 7797)]
 [New Thread 0x7ffff796b700 (LWP 7796)]

 Program received signal SIGINT, Interrupt.
 [Switching to Thread 0x7ffff796b700 (LWP 7796)]
 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
 81              testq   %rax,%rax
 (gdb) info threads

A couple of tests had assumptions on GDB thread numbers that no longer
hold.

Tested on x86_64 Fedora 20.

gdb/
2014-12-16  Pedro Alves  <palves@redhat.com>

	Skip enabling event reporting if the kernel supports
	PTRACE_EVENT_CLONE.
	* linux-thread-db.c: Include "nat/linux-ptrace.h".
	(thread_db_use_events): New function.
	(try_thread_db_load_1): Check thread_db_use_events before enabling
	event reporting.
	(update_thread_state): New function.
	(attach_thread): Use it.  Check thread_db_use_events before
	enabling event reporting.
	(thread_db_detach): Check thread_db_use_events before disabling
	event reporting.
	(find_new_threads_callback): Check thread_db_use_events before
	enabling event reporting.  Update the thread's state if not using
	libthread_db events.

gdb/testsuite/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* gdb.threads/fork-thread-pending.exp: Switch to the main thread
	instead of to thread 2.
	* gdb.threads/signal-command-multiple-signals-pending.c (main):
	Add barrier around each pthread_create call instead of around all
	calls.
	* gdb.threads/signal-command-multiple-signals-pending.exp (test):
	Set a break on thread_function and have the child threads hit it
	one at at a time.
---
 gdb/linux-thread-db.c                              | 39 ++++++++++++++++++----
 gdb/testsuite/gdb.threads/fork-thread-pending.exp  |  2 +-
 .../signal-command-multiple-signals-pending.c      | 11 +++---
 .../signal-command-multiple-signals-pending.exp    |  7 ++++
 4 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index 4b26984..35181f0 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -38,6 +38,7 @@
 #include "observer.h"
 #include "linux-nat.h"
 #include "nat/linux-procfs.h"
+#include "nat/linux-ptrace.h"
 #include "nat/linux-osdata.h"
 #include "auto-load.h"
 #include "cli/cli-utils.h"
@@ -77,6 +78,16 @@ static char *libthread_db_search_path;
    by the "set auto-load libthread-db" command.  */
 static int auto_load_thread_db = 1;
 
+/* Returns true if we need to use thread_db thread create/death event
+   breakpoints to learn about threads.  */
+
+static int
+thread_db_use_events (void)
+{
+  /* Not necessary if the kernel supports clone events.  */
+  return !linux_supports_traceclone ();
+}
+
 /* "show" command for the auto_load_thread_db configuration variable.  */
 
 static void
@@ -832,7 +843,7 @@ try_thread_db_load_1 (struct thread_db_info *info)
     push_target (&thread_db_ops);
 
   /* Enable event reporting, but not when debugging a core file.  */
-  if (target_has_execution)
+  if (target_has_execution && thread_db_use_events ())
     enable_thread_event_reporting ();
 
   return 1;
@@ -1260,6 +1271,17 @@ thread_db_inferior_created (struct target_ops *target, int from_tty)
   check_for_thread_db ();
 }
 
+/* Update the thread's state (what's displayed in "info threads"),
+   from libthread_db thread state information.  */
+
+static void
+update_thread_state (struct private_thread_info *private,
+		     const td_thrinfo_t *ti_p)
+{
+  private->dying = (ti_p->ti_state == TD_THR_UNKNOWN
+		    || ti_p->ti_state == TD_THR_ZOMBIE);
+}
+
 /* Attach to a new thread.  This function is called when we receive a
    TD_CREATE event or when we iterate over all threads and find one
    that wasn't already in our list.  Returns true on success.  */
@@ -1341,8 +1363,7 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
   gdb_assert (ti_p->ti_tid != 0);
   private->th = *th_p;
   private->tid = ti_p->ti_tid;
-  if (ti_p->ti_state == TD_THR_UNKNOWN || ti_p->ti_state == TD_THR_ZOMBIE)
-    private->dying = 1;
+  update_thread_state (private, ti_p);
 
   /* Add the thread to GDB's thread list.  */
   if (tp == NULL)
@@ -1354,7 +1375,7 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
 
   /* Enable thread event reporting for this thread, except when
      debugging a core file.  */
-  if (target_has_execution)
+  if (target_has_execution && thread_db_use_events ())
     {
       err = info->td_thr_event_enable_p (th_p, 1);
       if (err != TD_OK)
@@ -1393,7 +1414,7 @@ thread_db_detach (struct target_ops *ops, const char *args, int from_tty)
 
   if (info)
     {
-      if (target_has_execution)
+      if (target_has_execution && thread_db_use_events ())
 	{
 	  disable_thread_event_reporting (info);
 
@@ -1629,7 +1650,7 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
 	 need this glibc bug workaround.  */
       info->need_stale_parent_threads_check = 0;
 
-      if (target_has_execution)
+      if (target_has_execution && thread_db_use_events ())
 	{
 	  err = info->td_thr_event_enable_p (th_p, 1);
 	  if (err != TD_OK)
@@ -1666,6 +1687,12 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
 	   iteration: thread_db_find_new_threads_2 will retry.  */
 	return 1;
     }
+  else if (target_has_execution && !thread_db_use_events ())
+    {
+      /* Need to update this if not using the libthread_db events
+	 (particularly, the TD_DEATH event).  */
+      update_thread_state (tp->private, &ti);
+    }
 
   return 0;
 }
diff --git a/gdb/testsuite/gdb.threads/fork-thread-pending.exp b/gdb/testsuite/gdb.threads/fork-thread-pending.exp
index 57e45c9..357cb9e 100644
--- a/gdb/testsuite/gdb.threads/fork-thread-pending.exp
+++ b/gdb/testsuite/gdb.threads/fork-thread-pending.exp
@@ -46,7 +46,7 @@ gdb_test "continue" "Catchpoint.*" "1, get to the fork event"
 
 gdb_test "info threads" " Thread .* Thread .* Thread .* Thread .*" "1, multiple threads found"
 
-gdb_test "thread 2" ".*" "1, switched away from event thread"
+gdb_test "thread 1" ".*" "1, switched away from event thread"
 
 gdb_test "continue" "Not resuming.*" "1, refused to resume"
 
diff --git a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c
index 2fc5f53..761bef1 100644
--- a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c
+++ b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c
@@ -76,12 +76,13 @@ main (void)
   signal (SIGUSR1, handler_sigusr1);
   signal (SIGUSR2, handler_sigusr2);
 
-  pthread_barrier_init (&barrier, NULL, 3);
-
   for (i = 0; i < 2; i++)
-    pthread_create (&child_thread[i], NULL, thread_function, NULL);
-
-  pthread_barrier_wait (&barrier);
+    {
+      pthread_barrier_init (&barrier, NULL, 2);
+      pthread_create (&child_thread[i], NULL, thread_function, NULL);
+      pthread_barrier_wait (&barrier);
+      pthread_barrier_destroy (&barrier);
+    }
 
   all_threads_started ();
 
diff --git a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp
index b5ec00a..f574c57 100644
--- a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp
+++ b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp
@@ -46,6 +46,13 @@ proc test { schedlock } {
 	gdb_test "handle SIGUSR2 stop print pass"
 
 	gdb_test "break all_threads_started" "Breakpoint .* at .*$srcfile.*"
+
+	# Create threads one at a time, to insure stable thread
+	# numbers between runs and targets.
+	gdb_test "break thread_function" "Breakpoint .* at .*$srcfile.*"
+	gdb_test "continue" "thread_function.*" "thread 2 created"
+	gdb_test "continue" "thread_function.*" "thread 3 created"
+
 	gdb_test "continue" "all_threads_started.*"
 
 	# Using schedlock, let the main thread queue a signal for each
-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog
  2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
  2014-12-16 16:54 ` [PATCH 3/5] libthread_db: Skip attaching to terminated and joined threads Pedro Alves
@ 2014-12-16 16:54 ` Pedro Alves
  2014-12-17  8:02   ` Yao Qi
  2014-12-16 16:54 ` [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported Pedro Alves
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2014-12-16 16:54 UTC (permalink / raw)
  To: gdb-patches

Some debug output in linux-thread-db.c was being sent to gdb_stdout,
and some to gdb_stderr, while the right place to send debug output to is
gdb_stdlog.

gdb/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* linux-thread-db.c (thread_db_find_new_threads_silently)
	(try_thread_db_load_1, try_thread_db_load, thread_db_load_search)
	(find_new_threads_once): Print debug output on gdb_stdlog.
---
 gdb/linux-thread-db.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index c49b567..a405603 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -671,7 +671,7 @@ thread_db_find_new_threads_silently (ptid_t ptid)
   if (except.reason < 0)
     {
       if (libthread_db_debug)
-	exception_fprintf (gdb_stderr, except,
+	exception_fprintf (gdb_stdlog, except,
 			   "Warning: thread_db_find_new_threads_silently: ");
 
       /* There is a bug fixed between nptl 2.6.1 and 2.7 by
@@ -753,8 +753,8 @@ try_thread_db_load_1 (struct thread_db_info *info)
   if (err != TD_OK)
     {
       if (libthread_db_debug)
-	printf_unfiltered (_("td_ta_new failed: %s\n"),
-			   thread_db_err_str (err));
+	fprintf_unfiltered (gdb_stdlog, _("td_ta_new failed: %s\n"),
+			    thread_db_err_str (err));
       else
         switch (err)
           {
@@ -814,14 +814,16 @@ try_thread_db_load_1 (struct thread_db_info *info)
 
   if (libthread_db_debug || *libthread_db_search_path)
     {
+      struct ui_file *file;
       const char *library;
 
       library = dladdr_to_soname (*info->td_ta_new_p);
       if (library == NULL)
 	library = LIBTHREAD_DB_SO;
 
-      printf_unfiltered (_("Using host libthread_db library \"%s\".\n"),
-			 library);
+      file = *libthread_db_search_path != '\0' ? gdb_stdout : gdb_stdlog;
+      fprintf_unfiltered (file, _("Using host libthread_db library \"%s\".\n"),
+			  library);
     }
 
   /* The thread library was detected.  Activate the thread_db target
@@ -846,8 +848,9 @@ try_thread_db_load (const char *library, int check_auto_load_safe)
   struct thread_db_info *info;
 
   if (libthread_db_debug)
-    printf_unfiltered (_("Trying host libthread_db library: %s.\n"),
-                       library);
+    fprintf_unfiltered (gdb_stdlog,
+			_("Trying host libthread_db library: %s.\n"),
+			library);
 
   if (check_auto_load_safe)
     {
@@ -856,7 +859,8 @@ try_thread_db_load (const char *library, int check_auto_load_safe)
 	  /* Do not print warnings by file_is_auto_load_safe if the library does
 	     not exist at this place.  */
 	  if (libthread_db_debug)
-	    printf_unfiltered (_("open failed: %s.\n"), safe_strerror (errno));
+	    fprintf_unfiltered (gdb_stdlog, _("open failed: %s.\n"),
+				safe_strerror (errno));
 	  return 0;
 	}
 
@@ -871,7 +875,7 @@ try_thread_db_load (const char *library, int check_auto_load_safe)
   if (handle == NULL)
     {
       if (libthread_db_debug)
-	printf_unfiltered (_("dlopen failed: %s.\n"), dlerror ());
+	fprintf_unfiltered (gdb_stdlog, _("dlopen failed: %s.\n"), dlerror ());
       return 0;
     }
 
@@ -885,7 +889,7 @@ try_thread_db_load (const char *library, int check_auto_load_safe)
           const char *const libpath = dladdr_to_soname (td_init);
 
           if (libpath != NULL)
-            printf_unfiltered (_("Host %s resolved to: %s.\n"),
+            fprintf_unfiltered (gdb_stdlog, _("Host %s resolved to: %s.\n"),
                                library, libpath);
         }
     }
@@ -1076,7 +1080,8 @@ thread_db_load_search (void)
 
   do_cleanups (cleanups);
   if (libthread_db_debug)
-    printf_unfiltered (_("thread_db_load_search returning %d\n"), rc);
+    fprintf_unfiltered (gdb_stdlog,
+			_("thread_db_load_search returning %d\n"), rc);
   return rc;
 }
 
@@ -1683,11 +1688,12 @@ find_new_threads_once (struct thread_db_info *info, int iteration,
   if (libthread_db_debug)
     {
       if (except.reason < 0)
-	exception_fprintf (gdb_stderr, except,
+	exception_fprintf (gdb_stdlog, except,
 			   "Warning: find_new_threads_once: ");
 
-      printf_filtered (_("Found %d new threads in iteration %d.\n"),
-		       data.new_threads, iteration);
+      fprintf_unfiltered (gdb_stdlog,
+			  _("Found %d new threads in iteration %d.\n"),
+			  data.new_threads, iteration);
     }
 
   if (errp != NULL)
-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/
  2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
                   ` (2 preceding siblings ...)
  2014-12-16 16:54 ` [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported Pedro Alves
@ 2014-12-16 16:54 ` Pedro Alves
  2014-12-16 20:52   ` Simon Marchi
  2014-12-16 17:35 ` [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads Pedro Alves
  2015-01-09 12:03 ` [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
  5 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2014-12-16 16:54 UTC (permalink / raw)
  To: gdb-patches

... instead of relying on libthread_db.

I wrote a test that attaches to a program that constantly spawns
short-lived threads, which exposed several issues.  This is one of
them.

On Linux, we need to attach to all threads of a process (thread group)
individually.  We currently rely on libthread_db to list the threads,
but that is problematic, because libthread_db relies on reading data
structures out of the inferior (which may well be corrupted).  If
threads are being created or exiting just while we try to attach, we
may trip on inconsistencies in the inferior's thread list.  To work
around that, when we see a seemingly corrupt list, we currently retry
a few times:

 static void
 thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
 {
 ...
   if (until_no_new)
     {
       /* Require 4 successive iterations which do not find any new threads.
	  The 4 is a heuristic: there is an inherent race here, and I have
	  seen that 2 iterations in a row are not always sufficient to
	  "capture" all threads.  */
 ...

That heuristic may well fail, and when it does, we end up with threads
in the program that aren't under GDB's control.  That's obviously bad
and results in quite mistifying failures, like e.g., the process dying
for seeminly no reason when a thread that wasn't attached trips on a
breakpoint.

There's really no reason to rely on libthread_db for this nowadays
when we have /proc mounted.  In that case, which is the usual case, we
can list the LWPs from /proc/PID/task/.  In fact, GDBserver is already
doing this.  The patch factors out that code that knows to walk the
task/ directory out of GDBserver, and makes GDB use it too.

Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
them to stop immediately.  Instead, we just tag the LWP as having an
expected stop.  Because we can only set the ptrace options when the
thread stops, we need a new flag in the lwp structure to keep track of
whether we've already set the ptrace options, just like in GDBserver.
Note that nothing issues any ptrace command to the threads between the
PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
described in gdbserver's linux-low.c).

When we attach to a program that has threads exiting while we attach,
it's easy to race with a thread just exiting as we try to attach to
it, like:

  #1 - get current list of threads
  #2 - attach to each listed thread
  #3 - ooops, attach failed, thread is already gone

As this is pretty normal, we shouldn't be issuing a scary warning in
step #3.

When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
we'll see EPERM as well.  That happens when the kernel still has the
kernel in its task list, but the thread is marked as dead.
Unfortunately, EPERM is ambiguous and we'll get it also on other
scenarios where the thread isn't dead, and in those cases, it's useful
to get a warning.  To distiguish the cases, when we get an EPERM
failure, we open /proc/PID/status, and check the thread's state -- if
the /proc file no longer exists, or the state is "Z (Zombie)" or "X
(Dead)", we ignore the EPERM error silently; otherwise, we'll warn.
Unfortunately, there seems to be a kernel race here.  Sometimes I get
EPERM, and then the /proc state still indicates "R (Running)"...  If
we wait a bit and retry, we do end up seeing X or Z state, or get an
ESRCH.  I thought of making GDB retry the attach a few times, but even
with a 500ms wait and 4 retries, I still see the warning sometimes.  I
haven't been able to identify the kernel path that causes this yet,
but in any case, it looks like a kernel bug to me.  As this just
results failure to suppress a warning that we've been printing since
about forever anyway, I'm just making the test cope with it, and issue
an XFAIL.

gdb/gdbserver/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* linux-low.c (linux_attach_fail_reason_string): Move to
	nat/linux-ptrace.c, and rename.
	(linux_attach_lwp): Update comment.
	(attach_proc_task_lwp_callback): New function.
	(linux_attach): Adjus to rename and use
	linux_proc_attach_tgid_threads.
	(linux_attach_fail_reason_string): Delete declaration.

gdb/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (attach_proc_task_lwp_callback): New function.
	(linux_nat_attach): Use linux_proc_attach_tgid_threads.
	(wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's
	ptrace option flags.
	* linux-nat.h (struct lwp_info) <must_set_ptrace_flags>: New
	field.
	* nat/linux-procfs.c: Include <dirent.h>.
	(linux_proc_get_int): New parameter "warn".  Handle it.
	(linux_proc_get_tgid): Adjust.
	(linux_proc_get_tracerpid): Rename to ...
	(linux_proc_get_tracerpid_nowarn): ... this.
	(linux_proc_pid_get_state): New function, factored out from
	(linux_proc_pid_has_state): ... this.  Add new parameter "warn"
	and handle it.
	(linux_proc_pid_is_gone): New function.
	(linux_proc_pid_is_stopped): Adjust.
	(linux_proc_pid_is_zombie_maybe_warn)
	(linux_proc_pid_is_zombie_nowarn): New functions.
	(linux_proc_pid_is_zombie): Use
	linux_proc_pid_is_zombie_maybe_warn.
	(linux_proc_attach_tgid_threads): New function.
	* nat/linux-procfs.h (linux_proc_get_tgid): Update comment.
	(linux_proc_get_tracerpid): Rename to ...
	(linux_proc_get_tracerpid_nowarn): ... this, and update comment.
	(linux_proc_pid_is_gone): New declaration.
	(linux_proc_pid_is_zombie): Update comment.
	(linux_proc_pid_is_zombie_nowarn): New declaration.
	(linux_proc_attach_lwp_func): New typedef.
	(linux_proc_attach_tgid_threads): New declaration.
	* nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to
	use nowarn functions.
	(linux_ptrace_attach_fail_reason_string): Move here from
	gdbserver/linux-low.c and rename.
	(ptrace_supports_feature): If the current ptrace options are not
	known yet, check them now, instead of asserting.
	* nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string):
	Declare.
---
 gdb/gdbserver/linux-low.c | 152 +++++++++++++++++----------------------------
 gdb/gdbserver/linux-low.h |   6 --
 gdb/gdbserver/thread-db.c |   2 +-
 gdb/linux-nat.c           |  93 ++++++++++++++++++++++++++++
 gdb/linux-nat.h           |   4 ++
 gdb/nat/linux-procfs.c    | 153 ++++++++++++++++++++++++++++++++++++++++------
 gdb/nat/linux-procfs.h    |  32 ++++++++--
 gdb/nat/linux-ptrace.c    |  33 +++++++++-
 gdb/nat/linux-ptrace.h    |   8 +++
 9 files changed, 356 insertions(+), 127 deletions(-)

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 65f72a2..de2d407 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -631,31 +631,8 @@ linux_create_inferior (char *program, char **allargs)
   return pid;
 }
 
-char *
-linux_attach_fail_reason_string (ptid_t ptid, int err)
-{
-  static char *reason_string;
-  struct buffer buffer;
-  char *warnings;
-  long lwpid = ptid_get_lwp (ptid);
-
-  xfree (reason_string);
-
-  buffer_init (&buffer);
-  linux_ptrace_attach_fail_reason (lwpid, &buffer);
-  buffer_grow_str0 (&buffer, "");
-  warnings = buffer_finish (&buffer);
-  if (warnings[0] != '\0')
-    reason_string = xstrprintf ("%s (%d), %s",
-				strerror (err), err, warnings);
-  else
-    reason_string = xstrprintf ("%s (%d)",
-				strerror (err), err);
-  xfree (warnings);
-  return reason_string;
-}
-
-/* Attach to an inferior process.  */
+/* Attach to an inferior process.  Returns 0 on success, ERRNO on
+   error.  */
 
 int
 linux_attach_lwp (ptid_t ptid)
@@ -739,6 +716,50 @@ linux_attach_lwp (ptid_t ptid)
   return 0;
 }
 
+/* Callback for linux_proc_attach_tgid_threads.  Attach to PTID if not
+   already attached.  Returns true if a new LWP is found, false
+   otherwise.  */
+
+static int
+attach_proc_task_lwp_callback (ptid_t ptid)
+{
+  /* Is this a new thread?  */
+  if (find_thread_ptid (ptid) == NULL)
+    {
+      int lwpid = ptid_get_lwp (ptid);
+      int err;
+
+      if (debug_threads)
+	debug_printf ("Found new lwp %d\n", lwpid);
+
+      err = linux_attach_lwp (ptid);
+
+      /* Be quiet if we simply raced with the thread exiting.  EPERM
+	 is returned if the thread's task still exists, and is marked
+	 as exited or zombie, as well as other conditions, so in that
+	 case, confirm the status in /proc/PID/status.  */
+      if (err == ESRCH
+	  || (err == EPERM && linux_proc_pid_is_gone (lwpid)))
+	{
+	  if (debug_threads)
+	    {
+	      debug_printf ("Cannot attach to lwp %d: "
+			    "thread is gone (%d: %s)\n",
+			    lwpid, err, strerror (err));
+	    }
+	}
+      else if (err != 0)
+	{
+	  warning (_("Cannot attach to lwp %d: %s"),
+		   lwpid,
+		   linux_ptrace_attach_fail_reason_string (ptid, err));
+	}
+
+      return 1;
+    }
+  return 0;
+}
+
 /* Attach to PID.  If PID is the tgid, attach to it and all
    of its threads.  */
 
@@ -753,7 +774,7 @@ linux_attach (unsigned long pid)
   err = linux_attach_lwp (ptid);
   if (err != 0)
     error ("Cannot attach to process %ld: %s",
-	   pid, linux_attach_fail_reason_string (ptid, err));
+	   pid, linux_ptrace_attach_fail_reason_string (ptid, err));
 
   linux_add_process (pid, 1);
 
@@ -767,75 +788,16 @@ linux_attach (unsigned long pid)
       thread->last_resume_kind = resume_stop;
     }
 
-  if (linux_proc_get_tgid (pid) == pid)
-    {
-      DIR *dir;
-      char pathname[128];
-
-      sprintf (pathname, "/proc/%ld/task", pid);
-
-      dir = opendir (pathname);
-
-      if (!dir)
-	{
-	  fprintf (stderr, "Could not open /proc/%ld/task.\n", pid);
-	  fflush (stderr);
-	}
-      else
-	{
-	  /* At this point we attached to the tgid.  Scan the task for
-	     existing threads.  */
-	  int new_threads_found;
-	  int iterations = 0;
-
-	  while (iterations < 2)
-	    {
-	      struct dirent *dp;
-
-	      new_threads_found = 0;
-	      /* Add all the other threads.  While we go through the
-		 threads, new threads may be spawned.  Cycle through
-		 the list of threads until we have done two iterations without
-		 finding new threads.  */
-	      while ((dp = readdir (dir)) != NULL)
-		{
-		  unsigned long lwp;
-		  ptid_t ptid;
-
-		  /* Fetch one lwp.  */
-		  lwp = strtoul (dp->d_name, NULL, 10);
-
-		  ptid = ptid_build (pid, lwp, 0);
-
-		  /* Is this a new thread?  */
-		  if (lwp != 0 && find_thread_ptid (ptid) == NULL)
-		    {
-		      int err;
-
-		      if (debug_threads)
-			debug_printf ("Found new lwp %ld\n", lwp);
-
-		      err = linux_attach_lwp (ptid);
-		      if (err != 0)
-			warning ("Cannot attach to lwp %ld: %s",
-				 lwp,
-				 linux_attach_fail_reason_string (ptid, err));
-
-		      new_threads_found++;
-		    }
-		}
-
-	      if (!new_threads_found)
-		iterations++;
-	      else
-		iterations = 0;
-
-	      rewinddir (dir);
-	    }
-	  closedir (dir);
-	}
-    }
-
+  /* We must attach to every LWP.  If /proc is mounted, use that to
+     find them now.  On the one hand, the inferior may be using raw
+     clone instead of using pthreads.  On the other hand, even if it
+     is using pthreads, GDB may not be connected yet (thread_db needs
+     to do symbol lookups, through qSymbol).  Also, thread_db walks
+     structures in the inferior's address space to find the list of
+     threads/LWPs, and those structures may well be corrupted.  Note
+     that once thread_db is loaded, we'll still use it to list threads
+     and associate pthread info with each LWP.  */
+  linux_proc_attach_tgid_threads (pid, attach_proc_task_lwp_callback);
   return 0;
 }
 
diff --git a/gdb/gdbserver/linux-low.h b/gdb/gdbserver/linux-low.h
index 4820929..812e234 100644
--- a/gdb/gdbserver/linux-low.h
+++ b/gdb/gdbserver/linux-low.h
@@ -351,12 +351,6 @@ int linux_pid_exe_is_elf_64_file (int pid, unsigned int *machine);
    errno).  */
 int linux_attach_lwp (ptid_t ptid);
 
-/* Return the reason an attach failed, in string form.  ERR is the
-   error returned by linux_attach_lwp (an errno).  This string should
-   be copied into a buffer by the client if the string will not be
-   immediately used, or if it must persist.  */
-char *linux_attach_fail_reason_string (ptid_t ptid, int err);
-
 struct lwp_info *find_lwp_pid (ptid_t ptid);
 void linux_stop_lwp (struct lwp_info *lwp);
 
diff --git a/gdb/gdbserver/thread-db.c b/gdb/gdbserver/thread-db.c
index a704933..ac94892 100644
--- a/gdb/gdbserver/thread-db.c
+++ b/gdb/gdbserver/thread-db.c
@@ -339,7 +339,7 @@ attach_thread (const td_thrhandle_t *th_p, td_thrinfo_t *ti_p)
     {
       warning ("Could not attach to thread %ld (LWP %d): %s\n",
 	       ti_p->ti_tid, ti_p->ti_lid,
-	       linux_attach_fail_reason_string (ptid, err));
+	       linux_ptrace_attach_fail_reason_string (ptid, err));
       return 0;
     }
 
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 845d566..c6b5280 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1137,6 +1137,73 @@ linux_nat_create_inferior (struct target_ops *ops,
 #endif /* HAVE_PERSONALITY */
 }
 
+/* Callback for linux_proc_attach_tgid_threads.  Attach to PTID if not
+   already attached.  Returns true if a new LWP is found, false
+   otherwise.  */
+
+static int
+attach_proc_task_lwp_callback (ptid_t ptid)
+{
+  struct lwp_info *lp;
+
+  /* Ignore LWPs we're already attached to.  */
+  lp = find_lwp_pid (ptid);
+  if (lp == NULL)
+    {
+      int lwpid = ptid_get_lwp (ptid);
+
+      if (ptrace (PTRACE_ATTACH, lwpid, 0, 0) < 0)
+	{
+	  int err = errno;
+
+	  /* Be quiet if we simply raced with the thread exiting.
+	     EPERM is returned if the thread's task still exists, and
+	     is marked as exited or zombie, as well as other
+	     conditions, so in that case, confirm the status in
+	     /proc/PID/status.  */
+	  if (err == ESRCH
+	      || (err == EPERM && linux_proc_pid_is_gone (lwpid)))
+	    {
+	      if (debug_linux_nat)
+		{
+		  fprintf_unfiltered (gdb_stdlog,
+				      "Cannot attach to lwp %d: "
+				      "thread is gone (%d: %s)\n",
+				      lwpid, err, safe_strerror (err));
+		}
+	    }
+	  else
+	    {
+	      warning (_("Cannot attach to lwp %d: %s\n"),
+		       lwpid,
+		       linux_ptrace_attach_fail_reason_string (ptid,
+							       err));
+	    }
+	}
+      else
+	{
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"PTRACE_ATTACH %s, 0, 0 (OK)\n",
+				target_pid_to_str (ptid));
+
+	  lp = add_lwp (ptid);
+	  lp->cloned = 1;
+
+	  /* The next time we wait for this LWP we'll see a SIGSTOP as
+	     PTRACE_ATTACH brings it to a halt.  */
+	  lp->signalled = 1;
+
+	  /* We need to wait for a stop before being able to make the
+	     next ptrace call on this LWP.  */
+	  lp->must_set_ptrace_flags = 1;
+	}
+
+      return 1;
+    }
+  return 0;
+}
+
 static void
 linux_nat_attach (struct target_ops *ops, const char *args, int from_tty)
 {
@@ -1230,6 +1297,16 @@ linux_nat_attach (struct target_ops *ops, const char *args, int from_tty)
 
   lp->status = status;
 
+  /* We must attach to every LWP.  If /proc is mounted, use that to
+     find them now.  The inferior may be using raw clone instead of
+     using pthreads.  But even if it is using pthreads, thread_db
+     walks structures in the inferior's address space to find the list
+     of threads/LWPs, and those structures may well be corrupted.
+     Note that once thread_db is loaded, we'll still use it to list
+     threads and associate pthread info with each LWP.  */
+  linux_proc_attach_tgid_threads (ptid_get_pid (lp->ptid),
+				  attach_proc_task_lwp_callback);
+
   if (target_can_async_p ())
     target_async (inferior_event_handler, 0);
 }
@@ -2159,6 +2236,14 @@ wait_lwp (struct lwp_info *lp)
   gdb_assert (WIFSTOPPED (status));
   lp->stopped = 1;
 
+  if (lp->must_set_ptrace_flags)
+    {
+      struct inferior *inf = find_inferior_pid (ptid_get_pid (lp->ptid));
+
+      linux_enable_event_reporting (ptid_get_lwp (lp->ptid), inf->attach_flag);
+      lp->must_set_ptrace_flags = 0;
+    }
+
   /* Handle GNU/Linux's syscall SIGTRAPs.  */
   if (WIFSTOPPED (status) && WSTOPSIG (status) == SYSCALL_SIGTRAP)
     {
@@ -2783,6 +2868,14 @@ linux_nat_filter_event (int lwpid, int status, int *new_pending_p)
      ever being continued.)  */
   lp->stopped = 1;
 
+  if (WIFSTOPPED (status) && lp->must_set_ptrace_flags)
+    {
+      struct inferior *inf = find_inferior_pid (ptid_get_pid (lp->ptid));
+
+      linux_enable_event_reporting (ptid_get_lwp (lp->ptid), inf->attach_flag);
+      lp->must_set_ptrace_flags = 0;
+    }
+
   /* Handle GNU/Linux's syscall SIGTRAPs.  */
   if (WIFSTOPPED (status) && WSTOPSIG (status) == SYSCALL_SIGTRAP)
     {
diff --git a/gdb/linux-nat.h b/gdb/linux-nat.h
index 0aa8377..e7c323a 100644
--- a/gdb/linux-nat.h
+++ b/gdb/linux-nat.h
@@ -33,6 +33,10 @@ struct lwp_info
      and overall process id.  */
   ptid_t ptid;
 
+  /* If this flag is set, we need to set the event request flags the
+     next time we see this LWP stop.  */
+  int must_set_ptrace_flags;
+
   /* Non-zero if this LWP is cloned.  In this context "cloned" means
      that the LWP is reporting to its parent using a signal other than
      SIGCHLD.  */
diff --git a/gdb/nat/linux-procfs.c b/gdb/nat/linux-procfs.c
index 00f2f08..f4f4838 100644
--- a/gdb/nat/linux-procfs.c
+++ b/gdb/nat/linux-procfs.c
@@ -19,12 +19,13 @@
 #include "common-defs.h"
 #include "linux-procfs.h"
 #include "filestuff.h"
+#include <dirent.h>
 
 /* Return the TGID of LWPID from /proc/pid/status.  Returns -1 if not
    found.  */
 
 static int
-linux_proc_get_int (pid_t lwpid, const char *field)
+linux_proc_get_int (pid_t lwpid, const char *field, int warn)
 {
   size_t field_len = strlen (field);
   FILE *status_file;
@@ -35,7 +36,8 @@ linux_proc_get_int (pid_t lwpid, const char *field)
   status_file = gdb_fopen_cloexec (buf, "r");
   if (status_file == NULL)
     {
-      warning (_("unable to open /proc file '%s'"), buf);
+      if (warn)
+	warning (_("unable to open /proc file '%s'"), buf);
       return -1;
     }
 
@@ -56,45 +58,87 @@ linux_proc_get_int (pid_t lwpid, const char *field)
 int
 linux_proc_get_tgid (pid_t lwpid)
 {
-  return linux_proc_get_int (lwpid, "Tgid");
+  return linux_proc_get_int (lwpid, "Tgid", 1);
 }
 
 /* See linux-procfs.h.  */
 
 pid_t
-linux_proc_get_tracerpid (pid_t lwpid)
+linux_proc_get_tracerpid_nowarn (pid_t lwpid)
 {
-  return linux_proc_get_int (lwpid, "TracerPid");
+  return linux_proc_get_int (lwpid, "TracerPid", 0);
 }
 
-/* Return non-zero if 'State' of /proc/PID/status contains STATE.  */
+/* Fill in BUFFER, a buffer with BUFFER_SIZE bytes with the 'State'
+   line of /proc/PID/status.  Returns -1 on failure to open the /proc
+   file, 1 if the line is found, and 0 if not found.  If WARN, warn on
+   failure to open the /proc file.  */
 
 static int
-linux_proc_pid_has_state (pid_t pid, const char *state)
+linux_proc_pid_get_state (pid_t pid, char *buffer, size_t buffer_size,
+			  int warn)
 {
-  char buffer[100];
   FILE *procfile;
-  int retval;
   int have_state;
 
-  xsnprintf (buffer, sizeof (buffer), "/proc/%d/status", (int) pid);
+  xsnprintf (buffer, buffer_size, "/proc/%d/status", (int) pid);
   procfile = gdb_fopen_cloexec (buffer, "r");
   if (procfile == NULL)
     {
-      warning (_("unable to open /proc file '%s'"), buffer);
-      return 0;
+      if (warn)
+	warning (_("unable to open /proc file '%s'"), buffer);
+      return -1;
     }
 
   have_state = 0;
-  while (fgets (buffer, sizeof (buffer), procfile) != NULL)
+  while (fgets (buffer, buffer_size, procfile) != NULL)
     if (strncmp (buffer, "State:", 6) == 0)
       {
 	have_state = 1;
 	break;
       }
-  retval = (have_state && strstr (buffer, state) != NULL);
   fclose (procfile);
-  return retval;
+  return have_state;
+}
+
+/* See linux-procfs.h declaration.  */
+
+int
+linux_proc_pid_is_gone (pid_t pid)
+{
+  char buffer[100];
+  int have_state;
+
+  have_state = linux_proc_pid_get_state (pid, buffer, sizeof buffer, 0);
+  if (have_state < 0)
+    {
+      /* If we can't open the status file, assume the thread has
+	 disappeared.  */
+      return 1;
+    }
+  else if (have_state == 0)
+    {
+      /* No "State:" line, assume thread is alive.  */
+      return 0;
+    }
+  else
+    {
+      return (strstr (buffer, "Z (") != NULL
+	      || strstr (buffer, "X (") != NULL);
+    }
+}
+
+/* Return non-zero if 'State' of /proc/PID/status contains STATE.  If
+   WARN, warn on failure to open the /proc file.  */
+
+static int
+linux_proc_pid_has_state (pid_t pid, const char *state, int warn)
+{
+  char buffer[100];
+  int have_state;
+
+  have_state = linux_proc_pid_get_state (pid, buffer, sizeof buffer, warn);
+  return (have_state > 0 && strstr (buffer, state) != NULL);
 }
 
 /* Detect `T (stopped)' in `/proc/PID/status'.
@@ -103,7 +147,24 @@ linux_proc_pid_has_state (pid_t pid, const char *state)
 int
 linux_proc_pid_is_stopped (pid_t pid)
 {
-  return linux_proc_pid_has_state (pid, "T (stopped)");
+  return linux_proc_pid_has_state (pid, "T (stopped)", 1);
+}
+
+/* Return non-zero if PID is a zombie.  If WARN, warn on failure to
+   open the /proc file.  */
+
+static int
+linux_proc_pid_is_zombie_maybe_warn (pid_t pid, int warn)
+{
+  return linux_proc_pid_has_state (pid, "Z (zombie)", warn);
+}
+
+/* See linux-procfs.h declaration.  */
+
+int
+linux_proc_pid_is_zombie_nowarn (pid_t pid)
+{
+  return linux_proc_pid_is_zombie_maybe_warn (pid, 0);
 }
 
 /* See linux-procfs.h declaration.  */
@@ -111,7 +172,7 @@ linux_proc_pid_is_stopped (pid_t pid)
 int
 linux_proc_pid_is_zombie (pid_t pid)
 {
-  return linux_proc_pid_has_state (pid, "Z (zombie)");
+  return linux_proc_pid_is_zombie_maybe_warn (pid, 1);
 }
 
 /* See linux-procfs.h declaration.  */
@@ -132,3 +193,61 @@ linux_proc_pid_get_ns (pid_t pid, const char *ns)
 
   return NULL;
 }
+
+/* See linux-procfs.h.  */
+
+void
+linux_proc_attach_tgid_threads (pid_t pid,
+				linux_proc_attach_lwp_func attach_lwp)
+{
+  DIR *dir;
+  char pathname[128];
+  int new_threads_found;
+  int iterations;
+
+  if (linux_proc_get_tgid (pid) != pid)
+    return;
+
+  xsnprintf (pathname, sizeof (pathname), "/proc/%ld/task", (long) pid);
+  dir = opendir (pathname);
+  if (dir == NULL)
+    {
+      warning (_("Could not open /proc/%ld/task.\n"), (long) pid);
+      return;
+    }
+
+  /* Scan the task list for existing threads.  While we go through the
+     threads, new threads may be spawned.  Cycle through the list of
+     threads until we have done two iterations without finding new
+     threads.  */
+  for (iterations = 0; iterations < 2; iterations++)
+    {
+      struct dirent *dp;
+
+      new_threads_found = 0;
+      while ((dp = readdir (dir)) != NULL)
+	{
+	  unsigned long lwp;
+
+	  /* Fetch one lwp.  */
+	  lwp = strtoul (dp->d_name, NULL, 10);
+	  if (lwp != 0)
+	    {
+	      ptid_t ptid = ptid_build (pid, lwp, 0);
+
+	      if (attach_lwp (ptid))
+		new_threads_found = 1;
+	    }
+	}
+
+      if (new_threads_found)
+	{
+	  /* Start over.  */
+	  iterations = -1;
+	}
+
+      rewinddir (dir);
+    }
+
+  closedir (dir);
+}
diff --git a/gdb/nat/linux-procfs.h b/gdb/nat/linux-procfs.h
index 5e2a9ea..8ca7a0d 100644
--- a/gdb/nat/linux-procfs.h
+++ b/gdb/nat/linux-procfs.h
@@ -22,28 +22,50 @@
 #include <unistd.h>
 
 /* Return the TGID of LWPID from /proc/pid/status.  Returns -1 if not
-   found.  */
+   found.  Failure to open the /proc file results in a warning.  */
 
 extern int linux_proc_get_tgid (pid_t lwpid);
 
-/* Return the TracerPid of LWPID from /proc/pid/status.  Returns -1 if not
-   found.  */
+/* Return the TracerPid of LWPID from /proc/pid/status.  Returns -1 if
+   not found.  Does not warn on failure to open the /proc file.  */
 
-extern pid_t linux_proc_get_tracerpid (pid_t lwpid);
+extern pid_t linux_proc_get_tracerpid_nowarn (pid_t lwpid);
 
 /* Detect `T (stopped)' in `/proc/PID/status'.
    Other states including `T (tracing stop)' are reported as false.  */
 
 extern int linux_proc_pid_is_stopped (pid_t pid);
 
-/* Return non-zero if PID is a zombie.  */
+/* Return non-zero if PID is a zombie.  Failure to open the
+   /proc/pid/status file results in a warning.  */
 
 extern int linux_proc_pid_is_zombie (pid_t pid);
 
+/* Return non-zero if PID is a zombie.  Does not warn on failure to
+   open the /proc file.  */
+
+extern int linux_proc_pid_is_zombie_nowarn (pid_t pid);
+
+/* Return non-zero if /proc/PID/status indicates that PID is gone
+   (i.e., in Z/Zombie or X/Dead state).  Failure to open the /proc
+   file is assumed to indicate the thread is gone.  */
+
+extern int linux_proc_pid_is_gone (pid_t pid);
+
 /* Return an opaque string identifying PID's NS namespace or NULL if
  * the information is unavailable.  The returned string must be
  * released with xfree.  */
 
 extern char *linux_proc_pid_get_ns (pid_t pid, const char *ns);
 
+/* Callback function for linux_proc_attach_tgid_threads.  If the PTID
+   thread is not yet known, try to attach to it and return true,
+   otherwise return false.  */
+typedef int (*linux_proc_attach_lwp_func) (ptid_t ptid);
+
+/* If PID is a tgid, scan the /proc/PID/task/ directory for existing
+   threads, and call FUNC for each thread found.  */
+extern void linux_proc_attach_tgid_threads (pid_t pid,
+					    linux_proc_attach_lwp_func func);
+
 #endif /* COMMON_LINUX_PROCFS_H */
diff --git a/gdb/nat/linux-ptrace.c b/gdb/nat/linux-ptrace.c
index a0e0c32..dca740a 100644
--- a/gdb/nat/linux-ptrace.c
+++ b/gdb/nat/linux-ptrace.c
@@ -43,18 +43,44 @@ linux_ptrace_attach_fail_reason (pid_t pid, struct buffer *buffer)
 {
   pid_t tracerpid;
 
-  tracerpid = linux_proc_get_tracerpid (pid);
+  tracerpid = linux_proc_get_tracerpid_nowarn (pid);
   if (tracerpid > 0)
     buffer_xml_printf (buffer, _("process %d is already traced "
 				 "by process %d"),
 		       (int) pid, (int) tracerpid);
 
-  if (linux_proc_pid_is_zombie (pid))
+  if (linux_proc_pid_is_zombie_nowarn (pid))
     buffer_xml_printf (buffer, _("process %d is a zombie "
 				 "- the process has already terminated"),
 		       (int) pid);
 }
 
+/* See linux-ptrace.h.  */
+
+char *
+linux_ptrace_attach_fail_reason_string (ptid_t ptid, int err)
+{
+  static char *reason_string;
+  struct buffer buffer;
+  char *warnings;
+  long lwpid = ptid_get_lwp (ptid);
+
+  xfree (reason_string);
+
+  buffer_init (&buffer);
+  linux_ptrace_attach_fail_reason (lwpid, &buffer);
+  buffer_grow_str0 (&buffer, "");
+  warnings = buffer_finish (&buffer);
+  if (warnings[0] != '\0')
+    reason_string = xstrprintf ("%s (%d), %s",
+				strerror (err), err, warnings);
+  else
+    reason_string = xstrprintf ("%s (%d)",
+				strerror (err), err);
+  xfree (warnings);
+  return reason_string;
+}
+
 #if defined __i386__ || defined __x86_64__
 
 /* Address of the 'ret' instruction in asm code block below.  */
@@ -508,7 +534,8 @@ linux_disable_event_reporting (pid_t pid)
 static int
 ptrace_supports_feature (int ptrace_options)
 {
-  gdb_assert (current_ptrace_options >= 0);
+  if (current_ptrace_options == -1)
+    linux_check_ptrace_features ();
 
   return ((current_ptrace_options & ptrace_options) == ptrace_options);
 }
diff --git a/gdb/nat/linux-ptrace.h b/gdb/nat/linux-ptrace.h
index 588d38a..552934c 100644
--- a/gdb/nat/linux-ptrace.h
+++ b/gdb/nat/linux-ptrace.h
@@ -89,6 +89,14 @@ struct buffer;
 #endif
 
 extern void linux_ptrace_attach_fail_reason (pid_t pid, struct buffer *buffer);
+
+/* Find all possible reasons we could have failed to attach to PTID
+   and return them as a string.  ERR is the error PTRACE_ATTACH failed
+   with (an errno).  The result is stored in a static buffer.  This
+   string should be copied into a buffer by the client if the string
+   will not be immediately used, or if it must persist.  */
+extern char *linux_ptrace_attach_fail_reason_string (ptid_t ptid, int err);
+
 extern void linux_ptrace_init_warnings (void);
 extern void linux_enable_event_reporting (pid_t pid, int attached);
 extern void linux_disable_event_reporting (pid_t pid);
-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads
  2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
                   ` (3 preceding siblings ...)
  2014-12-16 16:54 ` [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/ Pedro Alves
@ 2014-12-16 17:35 ` Pedro Alves
  2014-12-17 11:10   ` Yao Qi
  2015-01-09 12:03 ` [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
  5 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2014-12-16 17:35 UTC (permalink / raw)
  To: gdb-patches

Before the previous fixes, on Linux, this would trigger several
different problems, like:

 [New LWP 27106]
 [New LWP 27047]
 warning: unable to open /proc file '/proc/-1/status'
 [New LWP 27813]
 [New LWP 27869]
 warning: Can't attach LWP 11962: No child processes
 Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: debugger service failed
 warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

gdb/testsuite/
2014-12-16  Pedro Alves  <palves@redhat.com>

	* gdb.threads/attach-many-short-lived-threads.c: New file.
	* gdb.threads/attach-many-short-lived-threads.exp: New file.
---
 .../gdb.threads/attach-many-short-lived-threads.c  | 117 ++++++++++++++++++
 .../attach-many-short-lived-threads.exp            | 135 +++++++++++++++++++++
 2 files changed, 252 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
 create mode 100644 gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp

diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
new file mode 100644
index 0000000..c0bcaf5
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
@@ -0,0 +1,117 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2014 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <pthread.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+pthread_t main_thread;
+pthread_attr_t detached_attr;
+pthread_attr_t joinable_attr;
+
+/* Data passed to joinable threads on creation.  This is allocated on
+   the heap and ownership transferred from parent to child.  (We do
+   this because it's not portable to cast pthread_t to pointer.)  */
+
+struct thread_arg
+{
+  pthread_t parent;
+};
+
+void *thread_fn (void *arg);
+
+/* Wrapper for pthread_create.   */
+
+void
+create_thread (pthread_attr_t *attr, struct thread_arg *arg)
+{
+  pthread_t child;
+  int rc;
+
+  while ((rc = pthread_create (&child, attr, thread_fn, arg)) != 0)
+    {
+      fprintf (stderr, "unexpected error from pthread_create: %s (%d)\n",
+	       strerror (rc), rc);
+      sleep (1);
+    }
+}
+
+/* Entry point for threads.  Detached threads have a NULL parameter.
+   Joinable threads first join their parent before spawning a new
+   child (and exiting).  The parent's tid is passed as pthread_create
+   argument.  */
+
+void *
+thread_fn (void *arg)
+{
+  struct thread_arg *p = arg;
+  pthread_attr_t *attr;
+
+  if (p != NULL)
+    {
+      if (p->parent != main_thread)
+	assert (pthread_join (p->parent, NULL) == 0);
+
+      attr = &joinable_attr;
+      p->parent = pthread_self ();
+    }
+  else
+    {
+      attr = &detached_attr;
+    }
+
+  create_thread (attr, p);
+  return NULL;
+}
+
+int
+main (int argc, char *argv[])
+{
+  int i;
+  int n_threads = 100;
+
+  if (argc > 1)
+    n_threads = atoi (argv[1]);
+
+  pthread_attr_init (&detached_attr);
+  pthread_attr_setdetachstate (&detached_attr, PTHREAD_CREATE_DETACHED);
+  pthread_attr_init (&joinable_attr);
+  pthread_attr_setdetachstate (&detached_attr, PTHREAD_CREATE_JOINABLE);
+
+  main_thread = pthread_self ();
+
+  for (i = 0; i < n_threads; ++i)
+    {
+      struct thread_arg *p;
+
+      /* Some threads a joinable, others are detached.  This exercises
+	 different code paths in the runtime.  */
+
+      p = malloc (sizeof *p);
+      p->parent = main_thread;
+      create_thread (&joinable_attr, p);
+
+      create_thread (&detached_attr, NULL);
+    }
+
+  /* Long enough for all the attach/detach sequences done by the .exp
+     file.  */
+  sleep (180);
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
new file mode 100644
index 0000000..b14510c
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
@@ -0,0 +1,135 @@
+# Copyright 2008-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test attaching to a program that is constantly spawning short-lived
+# threads.  The stresses the edge cases of attaching to threads that
+# have just been created or are in process of dying.  In addition, the
+# test attaches, debugs, detaches, reattaches in a loop a few times,
+# to stress the behavior of the debug API around detach (some systems
+# end up leaving stale state behind that confuse the following
+# attach).
+
+# Manipulation with PID on target is not supported.
+if [is_remote target] then {
+    return 0
+}
+
+standard_testfile
+
+# The test proper.  See description above.
+
+proc test {} {
+    global binfile
+    global gdb_prompt
+    global decimal
+
+    clean_restart ${binfile}
+
+    set testpid [spawn_wait_for_attach $binfile]
+
+    set attempts 10
+    for {set attempt 1} { $attempt <= $attempts } { incr attempt } {
+	with_test_prefix "iter $attempt" {
+	    set attached 0
+	    set eperm 0
+	    set test "attach"
+	    gdb_test_multiple "attach $testpid" $test {
+		-re "new threads in iteration" {
+		    # Seen when "set debug libthread_db" is on.
+		    exp_continue
+		}
+		-re "warning: Cannot attach to lwp $decimal: Operation not permitted" {
+		    # On Linux, PTRACE_ATTACH sometimes fails with
+		    # EPERM, even though /proc/PID/status indicates
+		    # the thread is running.
+		    set eperm 1
+		    exp_continue
+		}
+		-re "debugger service failed.*$gdb_prompt $" {
+		    fail $test
+		}
+		-re "$gdb_prompt $" {
+		    if {$eperm} {
+			kfail "gdb/NNNN" "$test (EPERM)"
+		    } else {
+			pass $test
+		    }
+		}
+		-re "Attaching to program.*process $testpid.*$gdb_prompt $" {
+		    pass $test
+		}
+	    }
+
+	    # Sleep a bit and try updating the thread list.  We should
+	    # know about all threads already at this point.  If we see
+	    # "New Thread" or similar being output, then "attach" is
+	    # failing to actually attach to all threads in the process,
+	    # which would be a bug.
+	    sleep 1
+	    set saw_new 0
+	    set test "info threads"
+	    gdb_test_multiple $test $test {
+		-re "New " {
+		    set saw_new 1
+		    exp_continue
+		}
+		-re "$gdb_prompt $" {
+		}
+	    }
+
+	    gdb_assert !$saw_new "no new threads"
+
+	    # Force breakpoints always inserted, so that threads we might
+	    # have failed to attach to hit them even when threads we do
+	    # know about are stopped.
+	    gdb_test_no_output "set breakpoint always-inserted on"
+
+	    # Run to a breakpoint a few times.  A few threads should spawn
+	    # and die meanwhile.  This checks that thread creation/death
+	    # events carry on correctly after attaching.  Also, be
+	    # detaching from the program and reattaching, we check that
+	    # the program doesn't die due to gdb leaving a pending
+	    # breakpoint hit on a new thread unprocessed.
+	    gdb_test "break thread_fn" "Breakpoint.*" "break thread_fn"
+
+	    # Wait a bit, to give time for most threads to hit the
+	    # breakpoint, including threads we might have failed to
+	    # attach.
+	    sleep 2
+
+	    set bps 3
+	    for {set bp 1} { $bp <= $bps } { incr bp } {
+		gdb_test "continue" "Breakpoint.*" "break at thread_fn: $bp"
+	    }
+
+	    if {$attempt < $attempts} {
+		gdb_test "detach" "Detaching from.*"
+	    } else {
+		gdb_test "kill" "" "kill process" "Kill the program being debugged.*y or n. $" "y"
+	    }
+
+	    gdb_test_no_output "set breakpoint always-inserted off"
+	    delete_breakpoints
+	}
+    }
+
+    remote_exec target "kill -9 ${testpid}"
+}
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+    return -1
+}
+
+test
-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/
  2014-12-16 16:54 ` [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/ Pedro Alves
@ 2014-12-16 20:52   ` Simon Marchi
  2014-12-17 13:35     ` Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Marchi @ 2014-12-16 20:52 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 6493 bytes --]

On 2014-12-16 11:53 AM, Pedro Alves wrote:
> ... instead of relying on libthread_db.
> 
> I wrote a test that attaches to a program that constantly spawns
> short-lived threads, which exposed several issues.  This is one of
> them.
> 
> On Linux, we need to attach to all threads of a process (thread group)
> individually.  We currently rely on libthread_db to list the threads,
> but that is problematic, because libthread_db relies on reading data
> structures out of the inferior (which may well be corrupted).  If
> threads are being created or exiting just while we try to attach, we
> may trip on inconsistencies in the inferior's thread list.  To work
> around that, when we see a seemingly corrupt list, we currently retry
> a few times:
> 
>  static void
>  thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
>  {
>  ...
>    if (until_no_new)
>      {
>        /* Require 4 successive iterations which do not find any new threads.
> 	  The 4 is a heuristic: there is an inherent race here, and I have
> 	  seen that 2 iterations in a row are not always sufficient to
> 	  "capture" all threads.  */
>  ...
> 
> That heuristic may well fail, and when it does, we end up with threads
> in the program that aren't under GDB's control.  That's obviously bad
> and results in quite mistifying failures, like e.g., the process dying
> for seeminly no reason when a thread that wasn't attached trips on a
> breakpoint.
> 
> There's really no reason to rely on libthread_db for this nowadays
> when we have /proc mounted.  In that case, which is the usual case, we
> can list the LWPs from /proc/PID/task/.  In fact, GDBserver is already
> doing this.  The patch factors out that code that knows to walk the
> task/ directory out of GDBserver, and makes GDB use it too.
> 
> Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
> them to stop immediately.  Instead, we just tag the LWP as having an
> expected stop.  Because we can only set the ptrace options when the
> thread stops, we need a new flag in the lwp structure to keep track of
> whether we've already set the ptrace options, just like in GDBserver.
> Note that nothing issues any ptrace command to the threads between the
> PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
> described in gdbserver's linux-low.c).
> 
> When we attach to a program that has threads exiting while we attach,
> it's easy to race with a thread just exiting as we try to attach to
> it, like:
> 
>   #1 - get current list of threads
>   #2 - attach to each listed thread
>   #3 - ooops, attach failed, thread is already gone
> 
> As this is pretty normal, we shouldn't be issuing a scary warning in
> step #3.
> 
> When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
> we'll see EPERM as well.  That happens when the kernel still has the
> kernel in its task list, but the thread is marked as dead.

"still has the kernel" -> "still has the thread"

> Unfortunately, EPERM is ambiguous and we'll get it also on other
> scenarios where the thread isn't dead, and in those cases, it's useful
> to get a warning.  To distiguish the cases, when we get an EPERM
> failure, we open /proc/PID/status, and check the thread's state -- if
> the /proc file no longer exists, or the state is "Z (Zombie)" or "X
> (Dead)", we ignore the EPERM error silently; otherwise, we'll warn.
> Unfortunately, there seems to be a kernel race here.  Sometimes I get
> EPERM, and then the /proc state still indicates "R (Running)"...  If
> we wait a bit and retry, we do end up seeing X or Z state, or get an
> ESRCH.  I thought of making GDB retry the attach a few times, but even
> with a 500ms wait and 4 retries, I still see the warning sometimes.  I
> haven't been able to identify the kernel path that causes this yet,
> but in any case, it looks like a kernel bug to me.  As this just
> results failure to suppress a warning that we've been printing since
> about forever anyway, I'm just making the test cope with it, and issue
> an XFAIL.
> 
> gdb/gdbserver/
> 2014-12-16  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-low.c (linux_attach_fail_reason_string): Move to
> 	nat/linux-ptrace.c, and rename.
> 	(linux_attach_lwp): Update comment.
> 	(attach_proc_task_lwp_callback): New function.
> 	(linux_attach): Adjus to rename and use

Adjus -> Adjust

> 	linux_proc_attach_tgid_threads.
> 	(linux_attach_fail_reason_string): Delete declaration.
> 
> gdb/
> 2014-12-16  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-nat.c (attach_proc_task_lwp_callback): New function.
> 	(linux_nat_attach): Use linux_proc_attach_tgid_threads.
> 	(wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's
> 	ptrace option flags.
> 	* linux-nat.h (struct lwp_info) <must_set_ptrace_flags>: New
> 	field.
> 	* nat/linux-procfs.c: Include <dirent.h>.
> 	(linux_proc_get_int): New parameter "warn".  Handle it.
> 	(linux_proc_get_tgid): Adjust.
> 	(linux_proc_get_tracerpid): Rename to ...
> 	(linux_proc_get_tracerpid_nowarn): ... this.
> 	(linux_proc_pid_get_state): New function, factored out from
> 	(linux_proc_pid_has_state): ... this.  Add new parameter "warn"
> 	and handle it.
> 	(linux_proc_pid_is_gone): New function.
> 	(linux_proc_pid_is_stopped): Adjust.
> 	(linux_proc_pid_is_zombie_maybe_warn)
> 	(linux_proc_pid_is_zombie_nowarn): New functions.
> 	(linux_proc_pid_is_zombie): Use
> 	linux_proc_pid_is_zombie_maybe_warn.
> 	(linux_proc_attach_tgid_threads): New function.
> 	* nat/linux-procfs.h (linux_proc_get_tgid): Update comment.
> 	(linux_proc_get_tracerpid): Rename to ...
> 	(linux_proc_get_tracerpid_nowarn): ... this, and update comment.
> 	(linux_proc_pid_is_gone): New declaration.
> 	(linux_proc_pid_is_zombie): Update comment.
> 	(linux_proc_pid_is_zombie_nowarn): New declaration.
> 	(linux_proc_attach_lwp_func): New typedef.
> 	(linux_proc_attach_tgid_threads): New declaration.
> 	* nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to
> 	use nowarn functions.
> 	(linux_ptrace_attach_fail_reason_string): Move here from
> 	gdbserver/linux-low.c and rename.
> 	(ptrace_supports_feature): If the current ptrace options are not
> 	known yet, check them now, instead of asserting.
> 	* nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string):
> 	Declare.

I think it makes sense, not that I know anything about it.

Simon


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4079 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported
  2014-12-16 16:54 ` [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported Pedro Alves
@ 2014-12-16 21:24   ` Simon Marchi
  2014-12-17 13:04     ` Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Marchi @ 2014-12-16 21:24 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 11824 bytes --]

On 2014-12-16 11:53 AM, Pedro Alves wrote:
> [A test I wrote stumbled on a libthread_db issue related to thread
> event breakpoints.  See glibc PR17705:
>  [nptl_db: stale thread create/death events if debugger detaches]
>  https://sourceware.org/bugzilla/show_bug.cgi?id=17705
> 
> This patch avoids that whole issue by making GDB stop using thread
> event breakpoints in the first place, which is good for other reasons
> as well, anyway.]
> 
> Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new
> threads in the inferior (to attach to them) or to learn about thread
> exit was to coordinate with the inferior's glibc/runtime, using
> libthread_db.  That works by putting a breakpoint at a magic address
> which is called when a new thread is spawned, or when a thread is
> about to exit.  When that breakpoint is hit, all threads are stopped,
> and then GDB coordinates with libthread_db to read data structures out
> of the inferior to learn about what happened.

That is libthread_db's TD_CREATE event? Could you point out where that is
done (stopping all the threads)? From the previous discussion with you, I
was thinking that those breakpoints did not affect execution. I don't find
any code in linux-thread-db.c that would do such a thing.

> Then the breakpoint is
> single-stepped, and then all threads are re-resumed.  This isn't very
> efficient (stops all threads) and is more fragile (inferior's thread
> list in memory may be corrupt; libthread_db bugs, etc.) than ideal.
> 
> When the kernel supports PTRACE_EVENT_CLONE (which we already make use
> of), there's really no need to use libthread_db's event reporting
> mechanism to learn about new LWPs.  And if the kernel supports that,
> then we learn about LWP exits through regular WIFEXITED wait statuses,
> so no need for the death event breakpoint either.
> 
> GDBserver has been likewise skipping the thread_db events for a long
> while:
>   https://sourceware.org/ml/gdb-patches/2007-10/msg00547.html
> 
> There's one user-visible difference: we'll no longer print about
> threads being created and exiting while the program is running, like:
> 
>  [Thread 0x7ffff7dbb700 (LWP 30670) exited]
>  [New Thread 0x7ffff7db3700 (LWP 30671)]
>  [Thread 0x7ffff7dd3700 (LWP 30667) exited]
>  [New Thread 0x7ffff7dab700 (LWP 30672)]
>  [Thread 0x7ffff7db3700 (LWP 30671) exited]
>  [Thread 0x7ffff7dcb700 (LWP 30668) exited]
> 
> This is exactly the same behavior as when debugging against remote
> targets / gdbserver.  I actually think that's a good thing (and as
> such have listed this in the local/remote parity wiki page a while
> ago), as the printing slows down the inferior.  It's also a
> distraction to keep bothering the user about short-lived threads that
> she won't be able to interact with anyway.  Instead, the user (and
> frontend) will be informed about new threads that currently exist in
> the program when the program next stops:

Is this a consequence of the change of algorithm, or did you actively changed
the behavior?

From what I understand, gdb still attaches to the new thread as soon as it spawns
(when it receives the PTRACE_EVENT_CLONE event), so it could print the notice when
the event happens. Not that I mind, but I just want to understand.

>  (gdb) c
>  ...
>  * ctrl-c *
>  [New Thread 0x7ffff7963700 (LWP 7797)]
>  [New Thread 0x7ffff796b700 (LWP 7796)]
> 
>  Program received signal SIGINT, Interrupt.
>  [Switching to Thread 0x7ffff796b700 (LWP 7796)]
>  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
>  81              testq   %rax,%rax
>  (gdb) info threads
> 
> A couple of tests had assumptions on GDB thread numbers that no longer
> hold.
> 
> Tested on x86_64 Fedora 20.
> 
> gdb/
> 2014-12-16  Pedro Alves  <palves@redhat.com>
> 
> 	Skip enabling event reporting if the kernel supports
> 	PTRACE_EVENT_CLONE.
> 	* linux-thread-db.c: Include "nat/linux-ptrace.h".
> 	(thread_db_use_events): New function.
> 	(try_thread_db_load_1): Check thread_db_use_events before enabling
> 	event reporting.
> 	(update_thread_state): New function.
> 	(attach_thread): Use it.  Check thread_db_use_events before
> 	enabling event reporting.
> 	(thread_db_detach): Check thread_db_use_events before disabling
> 	event reporting.
> 	(find_new_threads_callback): Check thread_db_use_events before
> 	enabling event reporting.  Update the thread's state if not using
> 	libthread_db events.
> 
> gdb/testsuite/
> 2014-12-16  Pedro Alves  <palves@redhat.com>
> 
> 	* gdb.threads/fork-thread-pending.exp: Switch to the main thread
> 	instead of to thread 2.
> 	* gdb.threads/signal-command-multiple-signals-pending.c (main):
> 	Add barrier around each pthread_create call instead of around all
> 	calls.
> 	* gdb.threads/signal-command-multiple-signals-pending.exp (test):
> 	Set a break on thread_function and have the child threads hit it
> 	one at at a time.
> ---
>  gdb/linux-thread-db.c                              | 39 ++++++++++++++++++----
>  gdb/testsuite/gdb.threads/fork-thread-pending.exp  |  2 +-
>  .../signal-command-multiple-signals-pending.c      | 11 +++---
>  .../signal-command-multiple-signals-pending.exp    |  7 ++++
>  4 files changed, 47 insertions(+), 12 deletions(-)
> 
> diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
> index 4b26984..35181f0 100644
> --- a/gdb/linux-thread-db.c
> +++ b/gdb/linux-thread-db.c
> @@ -38,6 +38,7 @@
>  #include "observer.h"
>  #include "linux-nat.h"
>  #include "nat/linux-procfs.h"
> +#include "nat/linux-ptrace.h"
>  #include "nat/linux-osdata.h"
>  #include "auto-load.h"
>  #include "cli/cli-utils.h"
> @@ -77,6 +78,16 @@ static char *libthread_db_search_path;
>     by the "set auto-load libthread-db" command.  */
>  static int auto_load_thread_db = 1;
>  
> +/* Returns true if we need to use thread_db thread create/death event
> +   breakpoints to learn about threads.  */
> +
> +static int
> +thread_db_use_events (void)
> +{
> +  /* Not necessary if the kernel supports clone events.  */
> +  return !linux_supports_traceclone ();
> +}
> +
>  /* "show" command for the auto_load_thread_db configuration variable.  */
>  
>  static void
> @@ -832,7 +843,7 @@ try_thread_db_load_1 (struct thread_db_info *info)
>      push_target (&thread_db_ops);
>  
>    /* Enable event reporting, but not when debugging a core file.  */
> -  if (target_has_execution)
> +  if (target_has_execution && thread_db_use_events ())
>      enable_thread_event_reporting ();
>  
>    return 1;
> @@ -1260,6 +1271,17 @@ thread_db_inferior_created (struct target_ops *target, int from_tty)
>    check_for_thread_db ();
>  }
>  
> +/* Update the thread's state (what's displayed in "info threads"),
> +   from libthread_db thread state information.  */
> +
> +static void
> +update_thread_state (struct private_thread_info *private,
> +		     const td_thrinfo_t *ti_p)
> +{
> +  private->dying = (ti_p->ti_state == TD_THR_UNKNOWN
> +		    || ti_p->ti_state == TD_THR_ZOMBIE);
> +}
> +
>  /* Attach to a new thread.  This function is called when we receive a
>     TD_CREATE event or when we iterate over all threads and find one
>     that wasn't already in our list.  Returns true on success.  */
> @@ -1341,8 +1363,7 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
>    gdb_assert (ti_p->ti_tid != 0);
>    private->th = *th_p;
>    private->tid = ti_p->ti_tid;
> -  if (ti_p->ti_state == TD_THR_UNKNOWN || ti_p->ti_state == TD_THR_ZOMBIE)
> -    private->dying = 1;
> +  update_thread_state (private, ti_p);
>  
>    /* Add the thread to GDB's thread list.  */
>    if (tp == NULL)
> @@ -1354,7 +1375,7 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
>  
>    /* Enable thread event reporting for this thread, except when
>       debugging a core file.  */
> -  if (target_has_execution)
> +  if (target_has_execution && thread_db_use_events ())
>      {
>        err = info->td_thr_event_enable_p (th_p, 1);
>        if (err != TD_OK)
> @@ -1393,7 +1414,7 @@ thread_db_detach (struct target_ops *ops, const char *args, int from_tty)
>  
>    if (info)
>      {
> -      if (target_has_execution)
> +      if (target_has_execution && thread_db_use_events ())
>  	{
>  	  disable_thread_event_reporting (info);
>  
> @@ -1629,7 +1650,7 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
>  	 need this glibc bug workaround.  */
>        info->need_stale_parent_threads_check = 0;
>  
> -      if (target_has_execution)
> +      if (target_has_execution && thread_db_use_events ())
>  	{
>  	  err = info->td_thr_event_enable_p (th_p, 1);
>  	  if (err != TD_OK)
> @@ -1666,6 +1687,12 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
>  	   iteration: thread_db_find_new_threads_2 will retry.  */
>  	return 1;
>      }
> +  else if (target_has_execution && !thread_db_use_events ())
> +    {
> +      /* Need to update this if not using the libthread_db events
> +	 (particularly, the TD_DEATH event).  */
> +      update_thread_state (tp->private, &ti);
> +    }
>  
>    return 0;
>  }
> diff --git a/gdb/testsuite/gdb.threads/fork-thread-pending.exp b/gdb/testsuite/gdb.threads/fork-thread-pending.exp
> index 57e45c9..357cb9e 100644
> --- a/gdb/testsuite/gdb.threads/fork-thread-pending.exp
> +++ b/gdb/testsuite/gdb.threads/fork-thread-pending.exp
> @@ -46,7 +46,7 @@ gdb_test "continue" "Catchpoint.*" "1, get to the fork event"
>  
>  gdb_test "info threads" " Thread .* Thread .* Thread .* Thread .*" "1, multiple threads found"
>  
> -gdb_test "thread 2" ".*" "1, switched away from event thread"
> +gdb_test "thread 1" ".*" "1, switched away from event thread"
>  
>  gdb_test "continue" "Not resuming.*" "1, refused to resume"
>  
> diff --git a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c
> index 2fc5f53..761bef1 100644
> --- a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c
> +++ b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.c
> @@ -76,12 +76,13 @@ main (void)
>    signal (SIGUSR1, handler_sigusr1);
>    signal (SIGUSR2, handler_sigusr2);
>  
> -  pthread_barrier_init (&barrier, NULL, 3);
> -
>    for (i = 0; i < 2; i++)
> -    pthread_create (&child_thread[i], NULL, thread_function, NULL);
> -
> -  pthread_barrier_wait (&barrier);
> +    {
> +      pthread_barrier_init (&barrier, NULL, 2);
> +      pthread_create (&child_thread[i], NULL, thread_function, NULL);
> +      pthread_barrier_wait (&barrier);
> +      pthread_barrier_destroy (&barrier);
> +    }
>  
>    all_threads_started ();
>  
> diff --git a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp
> index b5ec00a..f574c57 100644
> --- a/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp
> +++ b/gdb/testsuite/gdb.threads/signal-command-multiple-signals-pending.exp
> @@ -46,6 +46,13 @@ proc test { schedlock } {
>  	gdb_test "handle SIGUSR2 stop print pass"
>  
>  	gdb_test "break all_threads_started" "Breakpoint .* at .*$srcfile.*"
> +
> +	# Create threads one at a time, to insure stable thread
> +	# numbers between runs and targets.
> +	gdb_test "break thread_function" "Breakpoint .* at .*$srcfile.*"
> +	gdb_test "continue" "thread_function.*" "thread 2 created"
> +	gdb_test "continue" "thread_function.*" "thread 3 created"
> +
>  	gdb_test "continue" "all_threads_started.*"
>  
>  	# Using schedlock, let the main thread queue a signal for each
> 


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4079 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog
  2014-12-16 16:54 ` [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog Pedro Alves
@ 2014-12-17  8:02   ` Yao Qi
  2014-12-17 13:45     ` Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Yao Qi @ 2014-12-17  8:02 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

>    if (libthread_db_debug || *libthread_db_search_path)
>      {
> +      struct ui_file *file;
>        const char *library;
>  
>        library = dladdr_to_soname (*info->td_ta_new_p);
>        if (library == NULL)
>  	library = LIBTHREAD_DB_SO;
>  
> -      printf_unfiltered (_("Using host libthread_db library \"%s\".\n"),
> -			 library);
> +      file = *libthread_db_search_path != '\0' ? gdb_stdout : gdb_stdlog;

Nit: why don't we check libthread_db_debug instead? like:

      file = libthread_db_debug ? gdb_stdlog : gdb_stdout;

> +      fprintf_unfiltered (file, _("Using host libthread_db library \"%s\".\n"),
> +			  library);

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads
  2014-12-16 17:35 ` [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads Pedro Alves
@ 2014-12-17 11:10   ` Yao Qi
  2014-12-18  0:02     ` Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Yao Qi @ 2014-12-17 11:10 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> +if [is_remote target] then {
> +    return 0
> +}

We should check
"![isnative] || [is_remote host] || [target_info exists use_gdb_stub]" instead?

> +
> +standard_testfile
> +
> +# The test proper.  See description above.
> +
> +proc test {} {
> +    global binfile
> +    global gdb_prompt
> +    global decimal
> +
> +    clean_restart ${binfile}
> +
> +    set testpid [spawn_wait_for_attach $binfile]
> +
> +    set attempts 10
> +    for {set attempt 1} { $attempt <= $attempts } { incr attempt } {
> +	with_test_prefix "iter $attempt" {
> +	    set attached 0
> +	    set eperm 0
> +	    set test "attach"
> +	    gdb_test_multiple "attach $testpid" $test {
> +		-re "new threads in iteration" {
> +		    # Seen when "set debug libthread_db" is on.
> +		    exp_continue
> +		}
> +		-re "warning: Cannot attach to lwp $decimal: Operation not permitted" {
> +		    # On Linux, PTRACE_ATTACH sometimes fails with
> +		    # EPERM, even though /proc/PID/status indicates
> +		    # the thread is running.
> +		    set eperm 1
> +		    exp_continue
> +		}
> +		-re "debugger service failed.*$gdb_prompt $" {
> +		    fail $test
> +		}
> +		-re "$gdb_prompt $" {
> +		    if {$eperm} {
> +			kfail "gdb/NNNN" "$test (EPERM)"

Replace NNNN with a PR number?

> +		    } else {
> +			pass $test
> +		    }
> +		}
> +		-re "Attaching to program.*process $testpid.*$gdb_prompt $" {
> +		    pass $test
> +		}
> +	    }
> +
> +	    # Sleep a bit and try updating the thread list.  We should
> +	    # know about all threads already at this point.  If we see
> +	    # "New Thread" or similar being output, then "attach" is
> +	    # failing to actually attach to all threads in the process,
> +	    # which would be a bug.
> +	    sleep 1
> +	    set saw_new 0
> +	    set test "info threads"
> +	    gdb_test_multiple $test $test {
> +		-re "New " {
> +		    set saw_new 1
> +		    exp_continue
> +		}
> +		-re "$gdb_prompt $" {
> +		}
> +	    }
> +
> +	    gdb_assert !$saw_new "no new threads"

Nit: I feel the test above can be simplified a little bit,

gdb_test_multiple $test $test {
    -re "New .*$gdb_prompt $" {
        fail "no new threads"
    }
    -re "$gdb_prompt $" {
        pass "no new threads"
    }
}

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported
  2014-12-16 21:24   ` Simon Marchi
@ 2014-12-17 13:04     ` Pedro Alves
  0 siblings, 0 replies; 24+ messages in thread
From: Pedro Alves @ 2014-12-17 13:04 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 12/16/2014 09:24 PM, Simon Marchi wrote:

>> Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new
>> threads in the inferior (to attach to them) or to learn about thread
>> exit was to coordinate with the inferior's glibc/runtime, using
>> libthread_db.  That works by putting a breakpoint at a magic address
>> which is called when a new thread is spawned, or when a thread is
>> about to exit.  When that breakpoint is hit, all threads are stopped,
>> and then GDB coordinates with libthread_db to read data structures out
>> of the inferior to learn about what happened.
> 
> That is libthread_db's TD_CREATE event? Could you point out where that is
> done (stopping all the threads)?

When we're using libthread_db, the linux-thread-db.c target is pushed on
top of the target stack.  So a target_wait call ends up in
linux-thread-db.c:thread_db_wait:

static ptid_t
thread_db_wait (struct target_ops *ops,
		ptid_t ptid, struct target_waitstatus *ourstatus,
		int options)
{
...
  ptid = beneath->to_wait (beneath, ptid, ourstatus, options);
...
  if (ourstatus->kind == TARGET_WAITKIND_STOPPED
      && ourstatus->value.sig == GDB_SIGNAL_TRAP)
    /* Check for a thread event.  */
    check_event (ptid);
...
  return ptid;
}

and the beneath->to_wait call ends up in linux_nat_wait -- _that_ is
what stops all threads just before returning to thread_db_wait.

> From the previous discussion with you, I
> was thinking that those breakpoints did not affect execution. I don't find
> any code in linux-thread-db.c that would do such a thing.

I think you're thinking of https://sourceware.org/ml/gdb-patches/2014-12/msg00210.html

What I was saying is that although the TD_DEATH event results in all
threads stopping and then gdb core resuming the target, it's not when the
TD_DEATH event breakpoint is hit that we call delete_thread, so that's
not when mi_thread_exit ends up called.  Instead, after TD_DEATH, the
thread that is exiting actually still exists and is resumed (it still has
a few instructions to run inside glibc/pthread before actually calling
the exit syscall), and then later when the thread actually does the exit
syscall, waitpid returns an WIFEXITED status for it, and gdb _then_ calls
delete_thread, all within linux-nat.c, without stopping all threads.

>> This is exactly the same behavior as when debugging against remote
>> targets / gdbserver.  I actually think that's a good thing (and as
>> such have listed this in the local/remote parity wiki page a while
>> ago), as the printing slows down the inferior.  It's also a
>> distraction to keep bothering the user about short-lived threads that
>> she won't be able to interact with anyway.  Instead, the user (and
>> frontend) will be informed about new threads that currently exist in
>> the program when the program next stops:
> 
> Is this a consequence of the change of algorithm, or did you actively changed
> the behavior?

Both.  :-)  I made GDB do an implicit "info threads" just before
presenting a user-visible stop to the user a while ago.  See the
update_thread_list call in normal_stop, added in git b57bacecd5 -- see
also references to local/remote parity in that commit's log.

And it's a consequence in that stopping linux-thread-db.c from calling
add_thread results in that update_thread_list call finding new and dead
threads then.

> From what I understand, gdb still attaches to the new thread as soon as it spawns
> (when it receives the PTRACE_EVENT_CLONE event), 

Close, with PTRACE_EVENT_CLONE, gdb is automatically attached to the
new clone; the kernel does that for us.

> so it could print the notice when the event happens. 

Right, see the code in linux_handle_extended_wait that does that,
in non-stop mode, only.  I'd like to remove that bit soon enough
though.  I've mentioned before that I regret having added it.

> Not that I mind, but I just want to understand.

Hope I made things a little clearer.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/
  2014-12-16 20:52   ` Simon Marchi
@ 2014-12-17 13:35     ` Pedro Alves
  0 siblings, 0 replies; 24+ messages in thread
From: Pedro Alves @ 2014-12-17 13:35 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 12/16/2014 08:52 PM, Simon Marchi wrote:
> On 2014-12-16 11:53 AM, Pedro Alves wrote:
>> ... instead of relying on libthread_db.
>>
>> I wrote a test that attaches to a program that constantly spawns
>> short-lived threads, which exposed several issues.  This is one of
>> them.
>>
>> On Linux, we need to attach to all threads of a process (thread group)
>> individually.  We currently rely on libthread_db to list the threads,
>> but that is problematic, because libthread_db relies on reading data
>> structures out of the inferior (which may well be corrupted).  If
>> threads are being created or exiting just while we try to attach, we
>> may trip on inconsistencies in the inferior's thread list.  To work
>> around that, when we see a seemingly corrupt list, we currently retry
>> a few times:
>>
>>  static void
>>  thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
>>  {
>>  ...
>>    if (until_no_new)
>>      {
>>        /* Require 4 successive iterations which do not find any new threads.
>> 	  The 4 is a heuristic: there is an inherent race here, and I have
>> 	  seen that 2 iterations in a row are not always sufficient to
>> 	  "capture" all threads.  */
>>  ...
>>
>> That heuristic may well fail, and when it does, we end up with threads
>> in the program that aren't under GDB's control.  That's obviously bad
>> and results in quite mistifying failures, like e.g., the process dying
>> for seeminly no reason when a thread that wasn't attached trips on a
>> breakpoint.
>>
>> There's really no reason to rely on libthread_db for this nowadays
>> when we have /proc mounted.  In that case, which is the usual case, we
>> can list the LWPs from /proc/PID/task/.  In fact, GDBserver is already
>> doing this.  The patch factors out that code that knows to walk the
>> task/ directory out of GDBserver, and makes GDB use it too.
>>
>> Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
>> them to stop immediately.  Instead, we just tag the LWP as having an
>> expected stop.  Because we can only set the ptrace options when the
>> thread stops, we need a new flag in the lwp structure to keep track of
>> whether we've already set the ptrace options, just like in GDBserver.
>> Note that nothing issues any ptrace command to the threads between the
>> PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
>> described in gdbserver's linux-low.c).
>>
>> When we attach to a program that has threads exiting while we attach,
>> it's easy to race with a thread just exiting as we try to attach to
>> it, like:
>>
>>   #1 - get current list of threads
>>   #2 - attach to each listed thread
>>   #3 - ooops, attach failed, thread is already gone
>>
>> As this is pretty normal, we shouldn't be issuing a scary warning in
>> step #3.
>>
>> When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
>> we'll see EPERM as well.  That happens when the kernel still has the
>> kernel in its task list, but the thread is marked as dead.
> 
> "still has the kernel" -> "still has the thread"

Indeed.  Fixed locally.

>> 	(linux_attach): Adjus to rename and use
> 
> Adjus -> Adjust
> 

Fixed.

> 
> I think it makes sense, not that I know anything about it.

Thanks for the review.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog
  2014-12-17  8:02   ` Yao Qi
@ 2014-12-17 13:45     ` Pedro Alves
  2014-12-17 14:09       ` Yao Qi
  0 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2014-12-17 13:45 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 12/17/2014 08:02 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>>    if (libthread_db_debug || *libthread_db_search_path)
>>      {
>> +      struct ui_file *file;
>>        const char *library;
>>  
>>        library = dladdr_to_soname (*info->td_ta_new_p);
>>        if (library == NULL)
>>  	library = LIBTHREAD_DB_SO;
>>  
>> -      printf_unfiltered (_("Using host libthread_db library \"%s\".\n"),
>> -			 library);
>> +      file = *libthread_db_search_path != '\0' ? gdb_stdout : gdb_stdlog;
> 
> Nit: why don't we check libthread_db_debug instead? like:
> 
>       file = libthread_db_debug ? gdb_stdlog : gdb_stdout;

Let me answer that by adding a comment.  Does this make it clearer?

  if (*libthread_db_search_path || libthread_db_debug)
    {
      struct ui_file *file;
      const char *library;

      library = dladdr_to_soname (*info->td_ta_new_p);
      if (library == NULL)
	library = LIBTHREAD_DB_SO;

      /* If we'd print this to gdb_stdout when debug output is
	 disabled, still print it to gdb_stdout if debug output is
	 enabled.  User visible output should not depend on debug
	 settings.  */
      file = *libthread_db_search_path != '\0' ? gdb_stdout : gdb_stdlog;
      fprintf_unfiltered (file, _("Using host libthread_db library \"%s\".\n"),
			  library);
    }

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog
  2014-12-17 13:45     ` Pedro Alves
@ 2014-12-17 14:09       ` Yao Qi
  0 siblings, 0 replies; 24+ messages in thread
From: Yao Qi @ 2014-12-17 14:09 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> Let me answer that by adding a comment.  Does this make it clearer?

Yes, it is clearer....

>
>   if (*libthread_db_search_path || libthread_db_debug)

... and it is also important to exchange them in the condition.

>     {
>       struct ui_file *file;
>       const char *library;
>
>       library = dladdr_to_soname (*info->td_ta_new_p);
>       if (library == NULL)
> 	library = LIBTHREAD_DB_SO;
>
>       /* If we'd print this to gdb_stdout when debug output is
> 	 disabled, still print it to gdb_stdout if debug output is
> 	 enabled.  User visible output should not depend on debug
> 	 settings.  */

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads
  2014-12-17 11:10   ` Yao Qi
@ 2014-12-18  0:02     ` Pedro Alves
  2015-01-05 19:02       ` Breazeal, Don
  0 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2014-12-18  0:02 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 12/17/2014 11:10 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> +if [is_remote target] then {
>> +    return 0
>> +}
> 
> We should check
> "![isnative] || [is_remote host] || [target_info exists use_gdb_stub]" instead?

Hmm, I don't think the isnative check would be right.
That would check whether the build and target triplets are the
same, but triplets aren't what matters here.  The issue is that
spawn_wait_for_attach assumes build and target _machines_
(boards) are the same.  And that's what "[is_remote target]"
encodes.

In principle, the test should work with remote host testing,
_if_ the build and target machines are the same, even if that's
not a usual scenario, and in that case, [is_remote target] is false.
If remote host testing, and build != target, then [is_remote target]
should be true.  So seems to me we shouldn't check "is_remote host"
either.

I agree with the use_gdb_stub check though.

So how about we add a helper procedure for this (and use it in
other similar attach tests from here on), so we don't have to
constantly think over the same things?  Added to the patch.

>> +		-re "$gdb_prompt $" {
>> +		    if {$eperm} {
>> +			kfail "gdb/NNNN" "$test (EPERM)"
> 
> Replace NNNN with a PR number?

Ah, completely forgot that, thanks.  I've already added a description
of the kernel issue bit above, so I'll just make this an xfail
instead.

>> +	    # Sleep a bit and try updating the thread list.  We should
>> +	    # know about all threads already at this point.  If we see
>> +	    # "New Thread" or similar being output, then "attach" is
>> +	    # failing to actually attach to all threads in the process,
>> +	    # which would be a bug.
>> +	    sleep 1
>> +	    set saw_new 0
>> +	    set test "info threads"
>> +	    gdb_test_multiple $test $test {
>> +		-re "New " {
>> +		    set saw_new 1
>> +		    exp_continue
>> +		}
>> +		-re "$gdb_prompt $" {
>> +		}
>> +	    }
>> +
>> +	    gdb_assert !$saw_new "no new threads"
> 
> Nit: I feel the test above can be simplified a little bit,
> 
> gdb_test_multiple $test $test {
>     -re "New .*$gdb_prompt $" {
>         fail "no new threads"
>     }
>     -re "$gdb_prompt $" {
>         pass "no new threads"
>     }
> }
> 

Indeed.  I did that change, thanks.

I also cleaned up the test further and fixed a few things.
E.g.: a stumbled on a silly bug here:

> pthread_attr_init (&detached_attr);
> pthread_attr_setdetachstate (&detached_attr, PTHREAD_CREATE_DETACHED);
> pthread_attr_init (&joinable_attr);
> pthread_attr_setdetachstate (&detached_attr, PTHREAD_CREATE_JOINABLE);
                                ^^^^^^^^^^^^^

That lead to resource exhaustion resulting in an occasional timeouts when
testing with the native-extended-gdbserver board.  And then we'd crash here:

+      fprintf (stderr, "unexpected error from pthread_create: %s (%d)\n",
+	       strerror (rc), rc);

because I missed including string.h, leaving strerror unprototyped.

Here's the new version.

From b44fd42eebcb76a1c84cabae7b763e52c6b8239f Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 17 Dec 2014 20:40:05 +0000
Subject: [PATCH] Test attaching to a program that constantly spawns
 short-lived threads

Before the previous fixes, on Linux, this would trigger several
different problems, like:

 [New LWP 27106]
 [New LWP 27047]
 warning: unable to open /proc file '/proc/-1/status'
 [New LWP 27813]
 [New LWP 27869]
 warning: Can't attach LWP 11962: No child processes
 Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: debugger service failed
 warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

gdb/testsuite/
2014-12-17  Pedro Alves  <palves@redhat.com>

	* gdb.threads/attach-many-short-lived-threads.c: New file.
	* gdb.threads/attach-many-short-lived-threads.exp: New file.
	* lib/gdb.exp (can_spawn_for_attach): New procedure.
	(spawn_wait_for_attach): Error out if can_spawn_for_attach returns
	false.
---
 .../gdb.threads/attach-many-short-lived-threads.c  | 151 +++++++++++++++++++++
 .../attach-many-short-lived-threads.exp            | 132 ++++++++++++++++++
 gdb/testsuite/lib/gdb.exp                          |  24 ++++
 3 files changed, 307 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
 create mode 100644 gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp

diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
new file mode 100644
index 0000000..0528695
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c
@@ -0,0 +1,151 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2014 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+
+pthread_t main_thread;
+pthread_attr_t detached_attr;
+pthread_attr_t joinable_attr;
+
+/* Number of threads we'll create of each variant
+   (joinable/detached).  */
+int n_threads = 50;
+
+/* Mutex used to hold creating detached threads.  */
+pthread_mutex_t dthrds_create_mutex;
+
+/* Wrapper for pthread_create.   */
+
+void
+create_thread (pthread_attr_t *attr,
+	       void *(*start_routine) (void *), void *arg)
+{
+  pthread_t child;
+  int rc;
+
+  while ((rc = pthread_create (&child, attr, start_routine, arg)) != 0)
+    {
+      fprintf (stderr, "unexpected error from pthread_create: %s (%d)\n",
+	       strerror (rc), rc);
+      sleep (1);
+    }
+}
+
+void
+break_fn (void)
+{
+}
+
+/* Data passed to joinable threads on creation.  This is allocated on
+   the heap and ownership transferred from parent to child.  (We do
+   this because it's not portable to cast pthread_t to pointer.)  */
+
+struct thread_arg
+{
+  pthread_t parent;
+};
+
+/* Entry point for joinable threads.  These threads first join their
+   parent before spawning a new child (and exiting).  The parent's tid
+   is passed as pthread_create argument, encapsulated in a struct
+   thread_arg object.  */
+
+void *
+joinable_fn (void *arg)
+{
+  struct thread_arg *p = arg;
+
+  pthread_setname_np (pthread_self (), "joinable");
+
+  if (p->parent != main_thread)
+    assert (pthread_join (p->parent, NULL) == 0);
+
+  p->parent = pthread_self ();
+
+  create_thread (&joinable_attr, joinable_fn, p);
+
+  break_fn ();
+
+  return NULL;
+}
+
+/* Entry point for detached threads.  */
+
+void *
+detached_fn (void *arg)
+{
+  pthread_setname_np (pthread_self (), "detached");
+
+  /* This should throttle threads a bit in case we manage to spawn
+     threads faster than they exit.  */
+  pthread_mutex_lock (&dthrds_create_mutex);
+
+  create_thread (&detached_attr, detached_fn, NULL);
+
+  /* Note this is called before the mutex is unlocked otherwise in
+     non-stop mode, when the breakpoint is hit we'd keep spawning more
+     threads forever while the old threads stay alive (stopped in the
+     breakpoint).  */
+  break_fn ();
+
+  pthread_mutex_unlock (&dthrds_create_mutex);
+
+  return NULL;
+}
+
+int
+main (int argc, char *argv[])
+{
+  int i;
+
+  if (argc > 1)
+    n_threads = atoi (argv[1]);
+
+  pthread_mutex_init (&dthrds_create_mutex, NULL);
+
+  pthread_attr_init (&detached_attr);
+  pthread_attr_setdetachstate (&detached_attr, PTHREAD_CREATE_DETACHED);
+  pthread_attr_init (&joinable_attr);
+  pthread_attr_setdetachstate (&joinable_attr, PTHREAD_CREATE_JOINABLE);
+
+  main_thread = pthread_self ();
+
+  /* Spawn the initial set of test threads.  Some threads are
+     joinable, others are detached.  This exercises different code
+     paths in the runtime.  */
+  for (i = 0; i < n_threads; ++i)
+    {
+      struct thread_arg *p;
+
+      p = malloc (sizeof *p);
+      p->parent = main_thread;
+      create_thread (&joinable_attr, joinable_fn, p);
+
+      create_thread (&detached_attr, detached_fn, NULL);
+    }
+
+  /* Long enough for all the attach/detach sequences done by the .exp
+     file.  */
+  sleep (180);
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
new file mode 100644
index 0000000..ff39956
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
@@ -0,0 +1,132 @@
+# Copyright 2008-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test attaching to a program that is constantly spawning short-lived
+# threads.  The stresses the edge cases of attaching to threads that
+# have just been created or are in process of dying.  In addition, the
+# test attaches, debugs, detaches, reattaches in a loop a few times,
+# to stress the behavior of the debug API around detach (some systems
+# end up leaving stale state behind that confuse the following
+# attach).
+
+if {![can_spawn_for_attach]} {
+    return 0
+}
+
+standard_testfile
+
+# The test proper.  See description above.
+
+proc test {} {
+    global binfile
+    global gdb_prompt
+    global decimal
+
+    clean_restart ${binfile}
+
+    set testpid [spawn_wait_for_attach $binfile]
+
+    set attempts 10
+    for {set attempt 1} { $attempt <= $attempts } { incr attempt } {
+	with_test_prefix "iter $attempt" {
+	    set attached 0
+	    set eperm 0
+	    set test "attach"
+	    gdb_test_multiple "attach $testpid" $test {
+		-re "new threads in iteration" {
+		    # Seen when "set debug libthread_db" is on.
+		    exp_continue
+		}
+		-re "warning: Cannot attach to lwp $decimal: Operation not permitted" {
+		    # On Linux, PTRACE_ATTACH sometimes fails with
+		    # EPERM, even though /proc/PID/status indicates
+		    # the thread is running.
+		    set eperm 1
+		    exp_continue
+		}
+		-re "debugger service failed.*$gdb_prompt $" {
+		    fail $test
+		}
+		-re "$gdb_prompt $" {
+		    if {$eperm} {
+			xfail "$test (EPERM)"
+		    } else {
+			pass $test
+		    }
+		}
+		-re "Attaching to program.*process $testpid.*$gdb_prompt $" {
+		    pass $test
+		}
+	    }
+
+	    # Sleep a bit and try updating the thread list.  We should
+	    # know about all threads already at this point.  If we see
+	    # "New Thread" or similar being output, then "attach" is
+	    # failing to actually attach to all threads in the process,
+	    # which would be a bug.
+	    sleep 1
+
+	    set test "no new threads"
+	    gdb_test_multiple "info threads" $test {
+		-re "New .*$gdb_prompt $" {
+		    fail $test
+		}
+		-re "$gdb_prompt $" {
+		    pass $test
+		}
+	    }
+
+	    # Force breakpoints always inserted, so that threads we might
+	    # have failed to attach to hit them even when threads we do
+	    # know about are stopped.
+	    gdb_test_no_output "set breakpoint always-inserted on"
+
+	    # Run to a breakpoint a few times.  A few threads should spawn
+	    # and die meanwhile.  This checks that thread creation/death
+	    # events carry on correctly after attaching.  Also, be
+	    # detaching from the program and reattaching, we check that
+	    # the program doesn't die due to gdb leaving a pending
+	    # breakpoint hit on a new thread unprocessed.
+	    gdb_test "break break_fn" "Breakpoint.*" "break break_fn"
+
+	    # Wait a bit, to give time for most threads to hit the
+	    # breakpoint, including threads we might have failed to
+	    # attach.
+	    sleep 2
+
+	    set bps 3
+	    for {set bp 1} { $bp <= $bps } { incr bp } {
+		gdb_test "continue" "Breakpoint.*" "break at break_fn: $bp"
+	    }
+
+	    if {$attempt < $attempts} {
+		gdb_test "detach" "Detaching from.*"
+	    } else {
+		gdb_test "kill" "" "kill process" "Kill the program being debugged.*y or n. $" "y"
+	    }
+
+	    gdb_test_no_output "set breakpoint always-inserted off"
+	    delete_breakpoints
+	}
+    }
+
+    remote_exec target "kill -9 ${testpid}"
+}
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] == -1} {
+    return -1
+}
+
+test
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 08087f2..83fa1d0 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -3413,12 +3413,36 @@ proc gdb_exit { } {
     catch default_gdb_exit
 }

+# Return true if we can spawn a program on the target and attach to
+# it.
+
+proc can_spawn_for_attach { } {
+    # We use TCL's exec to get the inferior's pid.
+    if [is_remote target] then {
+	return 0
+    }
+
+    # The "attach" command doesn't make sense when the target is
+    # stub-like, where GDB finds the program already started on
+    # initial connection.
+    if {[target_info exists use_gdb_stub]} {
+	return 0
+    }
+
+    # Assume yes.
+    return 1
+}
+
 # Start a set of programs running and then wait for a bit, to be sure
 # that they can be attached to.  Return a list of the processes' PIDs.

 proc spawn_wait_for_attach { executable_list } {
     set pid_list {}

+    if ![can_spawn_for_attach] {
+	error "can't spawn for attach with this target/board"
+    }
+
     foreach {executable} $executable_list {
 	lappend pid_list [eval exec $executable &]
     }
-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads
  2014-12-18  0:02     ` Pedro Alves
@ 2015-01-05 19:02       ` Breazeal, Don
  2015-01-07 16:17         ` [PATCH] skip "attach" tests when testing against stub-like targets (was: Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads) Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Breazeal, Don @ 2015-01-05 19:02 UTC (permalink / raw)
  To: Pedro Alves, Yao Qi; +Cc: gdb-patches

On 12/17/2014 4:02 PM, Pedro Alves wrote:

> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> index 08087f2..83fa1d0 100644
> --- a/gdb/testsuite/lib/gdb.exp
> +++ b/gdb/testsuite/lib/gdb.exp
> @@ -3413,12 +3413,36 @@ proc gdb_exit { } {
>      catch default_gdb_exit
>  }
> 
> +# Return true if we can spawn a program on the target and attach to
> +# it.
> +
> +proc can_spawn_for_attach { } {
> +    # We use TCL's exec to get the inferior's pid.
> +    if [is_remote target] then {
> +	return 0
> +    }
> +
> +    # The "attach" command doesn't make sense when the target is
> +    # stub-like, where GDB finds the program already started on
> +    # initial connection.
> +    if {[target_info exists use_gdb_stub]} {
> +	return 0
> +    }
> +
> +    # Assume yes.
> +    return 1
> +}
> +
This solves the problem that I was working on here:

https://sourceware.org/ml/gdb-patches/2014-12/msg00520.html

When I call can_spawn_for_attach in the misbehaving attach tests I was
working on, they no longer spawn processes for 'target remote' that they
can't attach to.  Thanks!

>  # Start a set of programs running and then wait for a bit, to be sure
>  # that they can be attached to.  Return a list of the processes' PIDs.
> 
>  proc spawn_wait_for_attach { executable_list } {
>      set pid_list {}
> 
> +    if ![can_spawn_for_attach] {
> +	error "can't spawn for attach with this target/board"
> +    }

Should this be calling "error", or should it call something like
"untested" or "unsupported", since it isn't expected to work in these cases?

Thanks
--Don

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] skip "attach" tests when testing against stub-like targets (was: Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads)
  2015-01-05 19:02       ` Breazeal, Don
@ 2015-01-07 16:17         ` Pedro Alves
  2015-01-09 11:24           ` [PATCH] skip "attach" tests when testing against stub-like targets Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2015-01-07 16:17 UTC (permalink / raw)
  To: Breazeal, Don, Yao Qi; +Cc: gdb-patches

On 01/05/2015 07:01 PM, Breazeal, Don wrote:
> On 12/17/2014 4:02 PM, Pedro Alves wrote:
> 
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index 08087f2..83fa1d0 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -3413,12 +3413,36 @@ proc gdb_exit { } {
>>      catch default_gdb_exit
>>  }
>>
>> +# Return true if we can spawn a program on the target and attach to
>> +# it.
>> +
>> +proc can_spawn_for_attach { } {
>> +    # We use TCL's exec to get the inferior's pid.
>> +    if [is_remote target] then {
>> +	return 0
>> +    }
>> +
>> +    # The "attach" command doesn't make sense when the target is
>> +    # stub-like, where GDB finds the program already started on
>> +    # initial connection.
>> +    if {[target_info exists use_gdb_stub]} {
>> +	return 0
>> +    }
>> +
>> +    # Assume yes.
>> +    return 1
>> +}
>> +
> This solves the problem that I was working on here:
> 
> https://sourceware.org/ml/gdb-patches/2014-12/msg00520.html
> 
> When I call can_spawn_for_attach in the misbehaving attach tests I was
> working on, they no longer spawn processes for 'target remote' that they
> can't attach to.  Thanks!

Great!

> 
>>  # Start a set of programs running and then wait for a bit, to be sure
>>  # that they can be attached to.  Return a list of the processes' PIDs.
>>
>>  proc spawn_wait_for_attach { executable_list } {
>>      set pid_list {}
>>
>> +    if ![can_spawn_for_attach] {
>> +	error "can't spawn for attach with this target/board"
>> +    }
> 
> Should this be calling "error", or should it call something like
> "untested" or "unsupported", since it isn't expected to work in these cases?

The idea is that all .exp files that use spawn_wait_for_attach
would have already checked can_spawn_for_attach early, and skipped the
tests if false.  That makes is a test bug to see a call to
spawn_wait_for_attach if can_spawn_for_attach is false.

I should have really split those hunks out to a separate patch and
added calls to can_spawn_for_attach in all tests that are using
spawn_wait_for_attach already.  Like below.  WDYT?

(There are probably other attach tests that don't use
spawn_wait_for_attach that need the can_spawn_for_attach too.
We can do this incrementally.)

--------
From a7d938a9ca3762ea195a20b796772865a47283b6 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 7 Jan 2015 15:38:02 +0000
Subject: [PATCH] skip "attach" tests when testing against stub-like targets

We already skip "attach" tests if the target board is remote, in
dejagnu's sense, as we use TCL's exec to spawn the program on the
build machine.  We should also skip these tests if testing with
"target remote" or other stub-like targets where "attach" doesn't make
sense.

Add a helper procedure that centralizes the checks a test that needs
to spawn a program for testing "attach" and make all test files that
use spawn_wait_for_attach check it.

gdb/testsuite/
2014-12-17  Pedro Alves  <palves@redhat.com>

        * lib/gdb.exp (can_spawn_for_attach): New procedure.
        (spawn_wait_for_attach): Error out if can_spawn_for_attach returns
        false.
	* gdb.base/attach.exp: Use can_spawn_for_attach instead of
	checking whether the target board is remote.
	* gdb.multi/multi-attach.exp: Likewise.
	* gdb.python/py-sync-interp.exp: Likewise.
	* gdb.server/ext-attach.exp: Likewise.
	* gdb.python/py-prompt.exp: Use can_spawn_for_attach before the
	tests that need to attach, instead of checking whether the target
	board is remote at the top of the file.
---
 gdb/testsuite/gdb.base/attach.exp           |  3 +--
 gdb/testsuite/gdb.base/solib-overlap.exp    |  3 +--
 gdb/testsuite/gdb.multi/multi-attach.exp    |  3 +--
 gdb/testsuite/gdb.python/py-prompt.exp      |  9 ++++-----
 gdb/testsuite/gdb.python/py-sync-interp.exp |  3 +--
 gdb/testsuite/gdb.server/ext-attach.exp     |  3 +--
 gdb/testsuite/lib/gdb.exp                   | 27 +++++++++++++++++++++++++++
 7 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/gdb/testsuite/gdb.base/attach.exp b/gdb/testsuite/gdb.base/attach.exp
index 5fb5c53..96e5df4 100644
--- a/gdb/testsuite/gdb.base/attach.exp
+++ b/gdb/testsuite/gdb.base/attach.exp
@@ -25,8 +25,7 @@ if { [istarget "hppa*-*-hpux*"] } {
     return 0
 }
 
-# are we on a target board
-if [is_remote target] then {
+if {![can_spawn_for_attach]} {
     return 0
 }
 
diff --git a/gdb/testsuite/gdb.base/solib-overlap.exp b/gdb/testsuite/gdb.base/solib-overlap.exp
index 7486b07..71ff175 100644
--- a/gdb/testsuite/gdb.base/solib-overlap.exp
+++ b/gdb/testsuite/gdb.base/solib-overlap.exp
@@ -31,8 +31,7 @@ if [skip_shlib_tests] {
     return 0
 }
 
-# Are we on a target board?  It is required for attaching to a process.
-if [is_remote target] {
+if {![can_spawn_for_attach]} {
     return 0
 }
 
diff --git a/gdb/testsuite/gdb.multi/multi-attach.exp b/gdb/testsuite/gdb.multi/multi-attach.exp
index eaff2c9..7f2ac80 100644
--- a/gdb/testsuite/gdb.multi/multi-attach.exp
+++ b/gdb/testsuite/gdb.multi/multi-attach.exp
@@ -19,8 +19,7 @@
 
 standard_testfile
 
-# We need to use TCL's exec to get the pid.
-if [is_remote target] then {
+if {![can_spawn_for_attach]} {
     return 0
 }
 
diff --git a/gdb/testsuite/gdb.python/py-prompt.exp b/gdb/testsuite/gdb.python/py-prompt.exp
index 1c53c03..3d1f264 100644
--- a/gdb/testsuite/gdb.python/py-prompt.exp
+++ b/gdb/testsuite/gdb.python/py-prompt.exp
@@ -18,11 +18,6 @@
 
 standard_testfile
 
-# We need to use TCL's exec to get the pid.
-if [is_remote target] then {
-    return 0
-}
-
 load_lib gdb-python.exp
 load_lib prompt.exp
 
@@ -80,6 +75,10 @@ gdb_test "python print (\"'\" + str(p\[0\]) + \"'\")" "'$gdb_prompt_fail '" \
 	 "prompt_hook argument is default prompt. 2"
 gdb_exit
 
+if {![can_spawn_for_attach]} {
+    return 0
+}
+
 set testpid [spawn_wait_for_attach $binfile]
 
 set GDBFLAGS [concat $tmp_gdbflags " -ex \"set pagination off\""]
diff --git a/gdb/testsuite/gdb.python/py-sync-interp.exp b/gdb/testsuite/gdb.python/py-sync-interp.exp
index d62f966..7efafd1 100644
--- a/gdb/testsuite/gdb.python/py-sync-interp.exp
+++ b/gdb/testsuite/gdb.python/py-sync-interp.exp
@@ -20,8 +20,7 @@
 
 standard_testfile
 
-# We need to use TCL's exec to get the pid.
-if [is_remote target] then {
+if {![can_spawn_for_attach]} {
     return 0
 }
 
diff --git a/gdb/testsuite/gdb.server/ext-attach.exp b/gdb/testsuite/gdb.server/ext-attach.exp
index 9baeeb7..11f5a20 100644
--- a/gdb/testsuite/gdb.server/ext-attach.exp
+++ b/gdb/testsuite/gdb.server/ext-attach.exp
@@ -26,8 +26,7 @@ if { [skip_gdbserver_tests] } {
     return 0
 }
 
-# We need to use TCL's exec to get the pid.
-if [is_remote target] then {
+if {![can_spawn_for_attach]} {
     return 0
 }
 
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 08087f2..4386da8 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -3413,12 +3413,39 @@ proc gdb_exit { } {
     catch default_gdb_exit
 }
 
+# Return true if we can spawn a program on the target and attach to
+# it.
+
+proc can_spawn_for_attach { } {
+    # We use TCL's exec to get the inferior's pid.
+    if [is_remote target] then {
+	return 0
+    }
+
+    # The "attach" command doesn't make sense when the target is
+    # stub-like, where GDB finds the program already started on
+    # initial connection.
+    if {[target_info exists use_gdb_stub]} {
+	return 0
+    }
+
+    # Assume yes.
+    return 1
+}
+
 # Start a set of programs running and then wait for a bit, to be sure
 # that they can be attached to.  Return a list of the processes' PIDs.
+# It's a test error to call this when [can_spawn_for_attach] is false.
 
 proc spawn_wait_for_attach { executable_list } {
     set pid_list {}
 
+    if ![can_spawn_for_attach] {
+	# The caller should have checked can_spawn_for_attach itself
+	# before getting here.
+	error "can't spawn for attach with this target/board"
+    }
+
     foreach {executable} $executable_list {
 	lappend pid_list [eval exec $executable &]
     }
-- 
1.9.3


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] skip "attach" tests when testing against stub-like targets
  2015-01-07 16:17         ` [PATCH] skip "attach" tests when testing against stub-like targets (was: Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads) Pedro Alves
@ 2015-01-09 11:24           ` Pedro Alves
  2015-01-12  4:43             ` [regression/native-gdbserver][buildbot] Python testscases get staled (was: Re: [PATCH] skip "attach" tests when testing against stub-like targets) Sergio Durigan Junior
  0 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2015-01-09 11:24 UTC (permalink / raw)
  To: Breazeal, Don, Yao Qi; +Cc: gdb-patches

On 01/07/2015 04:17 PM, Pedro Alves wrote:
> On 01/05/2015 07:01 PM, Breazeal, Don wrote:
>>>  # Start a set of programs running and then wait for a bit, to be sure
>>>  # that they can be attached to.  Return a list of the processes' PIDs.
>>>
>>>  proc spawn_wait_for_attach { executable_list } {
>>>      set pid_list {}
>>>
>>> +    if ![can_spawn_for_attach] {
>>> +	error "can't spawn for attach with this target/board"
>>> +    }
>>
>> Should this be calling "error", or should it call something like
>> "untested" or "unsupported", since it isn't expected to work in these cases?
> 
> The idea is that all .exp files that use spawn_wait_for_attach
> would have already checked can_spawn_for_attach early, and skipped the
> tests if false.  That makes is a test bug to see a call to
> spawn_wait_for_attach if can_spawn_for_attach is false.
> 
> I should have really split those hunks out to a separate patch and
> added calls to can_spawn_for_attach in all tests that are using
> spawn_wait_for_attach already.  Like below.  WDYT?
> 
> (There are probably other attach tests that don't use
> spawn_wait_for_attach that need the can_spawn_for_attach too.
> We can do this incrementally.)

I went ahead and pushed this to unblock the parent series.

> gdb/testsuite/
> 2015-01-09  Pedro Alves  <palves@redhat.com>
>
>         * lib/gdb.exp (can_spawn_for_attach): New procedure.
>         (spawn_wait_for_attach): Error out if can_spawn_for_attach returns
>         false.
> 	* gdb.base/attach.exp: Use can_spawn_for_attach instead of
> 	checking whether the target board is remote.
> 	* gdb.multi/multi-attach.exp: Likewise.
> 	* gdb.python/py-sync-interp.exp: Likewise.
> 	* gdb.server/ext-attach.exp: Likewise.
> 	* gdb.python/py-prompt.exp: Use can_spawn_for_attach before the
> 	tests that need to attach, instead of checking whether the target
> 	board is remote at the top of the file.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/5] GNU/Linux, fix attach races/problems
  2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
                   ` (4 preceding siblings ...)
  2014-12-16 17:35 ` [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads Pedro Alves
@ 2015-01-09 12:03 ` Pedro Alves
  5 siblings, 0 replies; 24+ messages in thread
From: Pedro Alves @ 2015-01-09 12:03 UTC (permalink / raw)
  To: gdb-patches

On 12/16/2014 04:53 PM, Pedro Alves wrote:
> Investigating an "attach" bug seemingly caused by a race, related to
> the fact that on Linux we have to attach to each thread individually,
> I decided to write a test that stresses that aspect of attach.  The
> test constantly spawns short-lived threads, and has GDB attach, debug
> a little, detach, attach, rinse/repeat a few times.
> 
> That actually exposed a set of issues, both in GDB and in
> glibc/libthread_db.
> 
> One is that the test defeats the current heuristics in place: we still
> fail to attach to some threads sometimes, if the program spawns them
> quickly enough.  This is fixed by fetching the list of LWPs to attach
> to from /proc instead of relying on libthread_db for that.
> 
> Another is that some times we'd try to attach to a bogus lwp with
> PID==-1, and do a dangerous waitpid, potentially eating an event by
> mistake and breaking the debug session as result.
> 
> Yet another is a nasty libthread_db event reporting mechanism race
> related to detaching from the inferior just while a thread is
> reporting an event, resulting in a subsequent attach session tripping
> on broken libthread_db events.  We work around this by no longer using
> libthread_db for thread event creation/death reporting, which is good
> on its own, for being much more robust and efficient.
> 
> I've filed a couple glibc bugs as result of all this:
> 
>  Bug 17705 - nptl_db: stale thread create/death events if debugger detaches
>  https://sourceware.org/bugzilla/show_bug.cgi?id=17705
> 
>  Bug 17707 - nptl_db: terminated and joined threads
>  https://sourceware.org/bugzilla/show_bug.cgi?id=17707
> 
> The series fixes the GDB issues and at the same time works around the
> glibc issues.
> 
> Tested on x86_64 Fedora 20, native and gdbserver.
> 
> Comments?

I've pushed this in now.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [regression/native-gdbserver][buildbot] Python testscases get staled (was: Re: [PATCH] skip "attach" tests when testing against stub-like targets)
  2015-01-09 11:24           ` [PATCH] skip "attach" tests when testing against stub-like targets Pedro Alves
@ 2015-01-12  4:43             ` Sergio Durigan Junior
  2015-01-12 11:15               ` [regression/native-gdbserver][buildbot] Python testscases get staled Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Sergio Durigan Junior @ 2015-01-12  4:43 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Breazeal, Don, Yao Qi, gdb-patches

On Friday, January 09 2015, Pedro Alves wrote:

> On 01/07/2015 04:17 PM, Pedro Alves wrote:
>> On 01/05/2015 07:01 PM, Breazeal, Don wrote:
>>>>  # Start a set of programs running and then wait for a bit, to be sure
>>>>  # that they can be attached to.  Return a list of the processes' PIDs.
>>>>
>>>>  proc spawn_wait_for_attach { executable_list } {
>>>>      set pid_list {}
>>>>
>>>> +    if ![can_spawn_for_attach] {
>>>> +	error "can't spawn for attach with this target/board"
>>>> +    }
>>>
>>> Should this be calling "error", or should it call something like
>>> "untested" or "unsupported", since it isn't expected to work in these cases?
>> 
>> The idea is that all .exp files that use spawn_wait_for_attach
>> would have already checked can_spawn_for_attach early, and skipped the
>> tests if false.  That makes is a test bug to see a call to
>> spawn_wait_for_attach if can_spawn_for_attach is false.
>> 
>> I should have really split those hunks out to a separate patch and
>> added calls to can_spawn_for_attach in all tests that are using
>> spawn_wait_for_attach already.  Like below.  WDYT?
>> 
>> (There are probably other attach tests that don't use
>> spawn_wait_for_attach that need the can_spawn_for_attach too.
>> We can do this incrementally.)
>
> I went ahead and pushed this to unblock the parent series.

Hey Pedro,

Buildbot (which is running only internally so far, but will hopefully be
deployed this week) "caught" this when building in x86_64 (Fedora 21)
and testing the native-gdbserver variant (both x86_64 and x86).  When
you run the test on the gdb.python/ directory, you see that it stales
when it reaches the gdb.python/py-section-script.exp testcase.
Buildbot's gdb.log specifically has:

  Running ../../../binutils-gdb/gdb/testsuite/gdb.python/py-section-script.exp ...
  ERROR: (timeout) GDB never initialized after 10 seconds.
  ERROR: no fileid for gdbuild
  ERROR: Couldn't send python print ('test') to GDB.
  ERROR: no fileid for gdbuild
  ERROR: Couldn't send python print (sys.version_info[0]) to GDB.
  ERROR: no fileid for gdbuild
  ERROR: Couldn't send python print (sys.version_info[1]) to GDB.
  ERROR: no fileid for gdbuild
  ERROR: no fileid for gdbuild
  ...
  (this goes on and on, for several testcases)

I still did not debug this (intend to do so tomorrow, if you don't see
this before I start my day), but running a simple git-bisect showed me
that this specific commit is the culprit.

Cheers,

-- 
Sergio
GPG key ID: 0x65FC5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression/native-gdbserver][buildbot] Python testscases get staled
  2015-01-12  4:43             ` [regression/native-gdbserver][buildbot] Python testscases get staled (was: Re: [PATCH] skip "attach" tests when testing against stub-like targets) Sergio Durigan Junior
@ 2015-01-12 11:15               ` Pedro Alves
  2015-01-12 16:55                 ` Sergio Durigan Junior
  0 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2015-01-12 11:15 UTC (permalink / raw)
  To: Sergio Durigan Junior; +Cc: Breazeal, Don, Yao Qi, gdb-patches

On 01/12/2015 04:43 AM, Sergio Durigan Junior wrote:
> 
> Buildbot (which is running only internally so far, but will hopefully be
> deployed this week) "caught" this when building in x86_64 (Fedora 21)
> and testing the native-gdbserver variant (both x86_64 and x86).  When
> you run the test on the gdb.python/ directory, you see that it stales
> when it reaches the gdb.python/py-section-script.exp testcase.
> Buildbot's gdb.log specifically has:
> 
>   Running ../../../binutils-gdb/gdb/testsuite/gdb.python/py-section-script.exp ...
>   ERROR: (timeout) GDB never initialized after 10 seconds.
>   ERROR: no fileid for gdbuild
>   ERROR: Couldn't send python print ('test') to GDB.
>   ERROR: no fileid for gdbuild
>   ERROR: Couldn't send python print (sys.version_info[0]) to GDB.
>   ERROR: no fileid for gdbuild
>   ERROR: Couldn't send python print (sys.version_info[1]) to GDB.
>   ERROR: no fileid for gdbuild
>   ERROR: no fileid for gdbuild
>   ...
>   (this goes on and on, for several testcases)
> 

Here's what I get:

$ make check RUNTESTFLAGS="--target_board=native-gdbserver gdb.python/py-section-script.exp"
...
Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.python/py-section-script.exp ...

                === gdb Summary ===

# of expected passes            7
...

> I still did not debug this (intend to do so tomorrow, if you don't see
> this before I start my day), but running a simple git-bisect showed me
> that this specific commit is the culprit.

Odd, my commit did not touch this file.  Actually, the test does not
use "attach" at all.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression/native-gdbserver][buildbot] Python testscases get staled
  2015-01-12 11:15               ` [regression/native-gdbserver][buildbot] Python testscases get staled Pedro Alves
@ 2015-01-12 16:55                 ` Sergio Durigan Junior
  2015-01-12 17:01                   ` Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Sergio Durigan Junior @ 2015-01-12 16:55 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Breazeal, Don, Yao Qi, gdb-patches

On Monday, January 12 2015, Pedro Alves wrote:

> On 01/12/2015 04:43 AM, Sergio Durigan Junior wrote:
>> 
>> Buildbot (which is running only internally so far, but will hopefully be
>> deployed this week) "caught" this when building in x86_64 (Fedora 21)
>> and testing the native-gdbserver variant (both x86_64 and x86).  When
>> you run the test on the gdb.python/ directory, you see that it stales
>> when it reaches the gdb.python/py-section-script.exp testcase.
>> Buildbot's gdb.log specifically has:
>> 
>>   Running ../../../binutils-gdb/gdb/testsuite/gdb.python/py-section-script.exp ...
>>   ERROR: (timeout) GDB never initialized after 10 seconds.
>>   ERROR: no fileid for gdbuild
>>   ERROR: Couldn't send python print ('test') to GDB.
>>   ERROR: no fileid for gdbuild
>>   ERROR: Couldn't send python print (sys.version_info[0]) to GDB.
>>   ERROR: no fileid for gdbuild
>>   ERROR: Couldn't send python print (sys.version_info[1]) to GDB.
>>   ERROR: no fileid for gdbuild
>>   ERROR: no fileid for gdbuild
>>   ...
>>   (this goes on and on, for several testcases)
>> 
>
> Here's what I get:
>
> $ make check RUNTESTFLAGS="--target_board=native-gdbserver gdb.python/py-section-script.exp"
> ...
> Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.python/py-section-script.exp ...
>
>                 === gdb Summary ===
>
> # of expected passes            7
> ...

Running the testcase alone seems to be fine.  The problem happens when
you run the entire gdb.python/ directory:

  make check RUNTESTFLAGS="--target_board=native-gdbserver --directory=gdb.python"

-- 
Sergio
GPG key ID: 0x65FC5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [regression/native-gdbserver][buildbot] Python testscases get staled
  2015-01-12 16:55                 ` Sergio Durigan Junior
@ 2015-01-12 17:01                   ` Pedro Alves
  2015-01-12 17:13                     ` [PATCH] gdb.python/py-prompt.exp: restore GDBFLAGS Pedro Alves
  0 siblings, 1 reply; 24+ messages in thread
From: Pedro Alves @ 2015-01-12 17:01 UTC (permalink / raw)
  To: Sergio Durigan Junior; +Cc: Breazeal, Don, Yao Qi, gdb-patches

On 01/12/2015 04:55 PM, Sergio Durigan Junior wrote:

> Running the testcase alone seems to be fine.  The problem happens when
> you run the entire gdb.python/ directory:
> 
>   make check RUNTESTFLAGS="--target_board=native-gdbserver --directory=gdb.python"

Ah, yes.  I see the bug.  Will push a fix in a bit.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] gdb.python/py-prompt.exp: restore GDBFLAGS
  2015-01-12 17:01                   ` Pedro Alves
@ 2015-01-12 17:13                     ` Pedro Alves
  0 siblings, 0 replies; 24+ messages in thread
From: Pedro Alves @ 2015-01-12 17:13 UTC (permalink / raw)
  To: Sergio Durigan Junior; +Cc: Breazeal, Don, Yao Qi, gdb-patches

On 01/12/2015 05:00 PM, Pedro Alves wrote:
> On 01/12/2015 04:55 PM, Sergio Durigan Junior wrote:
> 
>> Running the testcase alone seems to be fine.  The problem happens when
>> you run the entire gdb.python/ directory:
>>
>>   make check RUNTESTFLAGS="--target_board=native-gdbserver --directory=gdb.python"
> 
> Ah, yes.  I see the bug.  Will push a fix in a bit.

Pushed now.  Thanks!

-----
Subject: [PATCH] gdb.python/py-prompt.exp: restore GDBFLAGS

The previous change to py-prompt.exp made it return without restoring
GDBFLAGS, resulting in breaking the following tests:

  $ make check RUNTESTFLAGS="--target_board=native-gdbserver --directory=gdb.python"
  ...
  Running src/gdb/testsuite/gdb.python/py-prompt.exp ...
  Running src/gdb/testsuite/gdb.python/py-section-script.exp ...
  ERROR: (timeout) GDB never initialized after 10 seconds.
  ERROR: no fileid for gdbuild
  ERROR: Couldn't send python print ('test') to GDB.
  ERROR: no fileid for gdbuild
  ERROR: Couldn't send python print (sys.version_info[0]) to GDB.
  ERROR: no fileid for gdbuild
  ERROR: Couldn't send python print (sys.version_info[1]) to GDB.
  ERROR: no fileid for gdbuild
  ERROR: no fileid for gdbuild
  ...

gdb/testsuite/
2015-01-12  Pedro Alves  <palves@redhat.com>

	* gdb.python/py-prompt.exp: When the board can't spawn for attach,
	restore GDBFLAGS before returning.
---
 gdb/testsuite/ChangeLog                | 5 +++++
 gdb/testsuite/gdb.python/py-prompt.exp | 1 +
 2 files changed, 6 insertions(+)

diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 7343d03..db1f521 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-12  Pedro Alves  <palves@redhat.com>
+
+	* gdb.python/py-prompt.exp: When the board can't spawn for attach,
+	restore GDBFLAGS before returning.
+
 2015-01-12  Jan Kratochvil  <jan.kratochvil@redhat.com>

 	* gdb.python/py-frame.exp (test Frame.read_register(rip)): Use
diff --git a/gdb/testsuite/gdb.python/py-prompt.exp b/gdb/testsuite/gdb.python/py-prompt.exp
index 55f0f59..0f7a3a4 100644
--- a/gdb/testsuite/gdb.python/py-prompt.exp
+++ b/gdb/testsuite/gdb.python/py-prompt.exp
@@ -76,6 +76,7 @@ gdb_test "python print (\"'\" + str(p\[0\]) + \"'\")" "'$gdb_prompt_fail '" \
 gdb_exit

 if {![can_spawn_for_attach]} {
+    set GDBFLAGS $saved_gdbflags
     return 0
 }

-- 
1.9.3


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-01-12 17:13 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-16 16:54 [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves
2014-12-16 16:54 ` [PATCH 3/5] libthread_db: Skip attaching to terminated and joined threads Pedro Alves
2014-12-16 16:54 ` [PATCH 1/5] libthread_db: debug output should go to gdb_stdlog Pedro Alves
2014-12-17  8:02   ` Yao Qi
2014-12-17 13:45     ` Pedro Alves
2014-12-17 14:09       ` Yao Qi
2014-12-16 16:54 ` [PATCH 4/5] Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported Pedro Alves
2014-12-16 21:24   ` Simon Marchi
2014-12-17 13:04     ` Pedro Alves
2014-12-16 16:54 ` [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/ Pedro Alves
2014-12-16 20:52   ` Simon Marchi
2014-12-17 13:35     ` Pedro Alves
2014-12-16 17:35 ` [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads Pedro Alves
2014-12-17 11:10   ` Yao Qi
2014-12-18  0:02     ` Pedro Alves
2015-01-05 19:02       ` Breazeal, Don
2015-01-07 16:17         ` [PATCH] skip "attach" tests when testing against stub-like targets (was: Re: [PATCH 5/5] Test attaching to a program that constantly spawns short-lived threads) Pedro Alves
2015-01-09 11:24           ` [PATCH] skip "attach" tests when testing against stub-like targets Pedro Alves
2015-01-12  4:43             ` [regression/native-gdbserver][buildbot] Python testscases get staled (was: Re: [PATCH] skip "attach" tests when testing against stub-like targets) Sergio Durigan Junior
2015-01-12 11:15               ` [regression/native-gdbserver][buildbot] Python testscases get staled Pedro Alves
2015-01-12 16:55                 ` Sergio Durigan Junior
2015-01-12 17:01                   ` Pedro Alves
2015-01-12 17:13                     ` [PATCH] gdb.python/py-prompt.exp: restore GDBFLAGS Pedro Alves
2015-01-09 12:03 ` [PATCH 0/5] GNU/Linux, fix attach races/problems Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).