public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/6] Linux native thread create/exit events support
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
  2016-05-19 14:48 ` [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run Pedro Alves
  2016-05-19 14:48 ` [PATCH 2/6] [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of /proc/PID/maps Pedro Alves
@ 2016-05-19 14:48 ` Pedro Alves
  2016-05-23 10:49   ` Yao Qi
  2016-05-19 14:56 ` [PATCH 5/6] Make gdb/linux-nat.c consider a waitstatus pending on the infrun side Pedro Alves
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:48 UTC (permalink / raw)
  To: gdb-patches

A following patch (fix for gdb/19828) makes linux-nat.c add threads to
GDB's thread list earlier in the "attach" sequence, and that causes a
surprising regression on
gdb.threads/attach-many-short-lived-threads.exp on my machine.  The
extra "thread x exited" handling and traffic slows down that test
enough that GDB core has trouble keeping up with new threads that are
spawned while trying to stop existing ones.

I saw the exact same issue with remote/gdbserver a while ago and fixed
it in 65706a29bac5 (Remote thread create/exit events) so part of the
fix here is the exact same -- add support for thread created events to
gdb/linux-nat.c.  infrun.c:stop_all_threads enables those events when
it tries to stop threads, which ensures that new threads never get a
chance to themselves start new threads, thus fixing the race.

gdb/
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-nat.c (report_thread_events): New global.
	(linux_handle_extended_wait): Report
	TARGET_WAITKIND_THREAD_CREATED if thread event reporting is
	enabled.
	(wait_lwp, linux_nat_filter_event): Report all thread exits if
	thread event reporting is enabled.  Update comments.
	(filter_exit_event): New function.
	(linux_nat_wait_1): Use it.
	(linux_nat_thread_events): New function.
	(linux_nat_add_target): Install it as target_thread_events method.
---
 gdb/linux-nat.c       | 58 +++++++++++++++++++++++++++++++++++++++++++++------
 gdb/linux-thread-db.c | 14 +++++++------
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index edde88d..5ec56c1 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -239,6 +239,9 @@ struct simple_pid_list
 };
 struct simple_pid_list *stopped_pids;
 
+/* Whether target_thread_events is in effect.  */
+static int report_thread_events;
+
 /* Async mode support.  */
 
 /* The read/write ends of the pipe registered as waitable file in the
@@ -1952,6 +1955,11 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
 				    status_to_str (status));
 	      new_lp->status = status;
 	    }
+	  else if (report_thread_events)
+	    {
+	      new_lp->waitstatus.kind = TARGET_WAITKIND_THREAD_CREATED;
+	      new_lp->status = status;
+	    }
 
 	  return 1;
 	}
@@ -2091,13 +2099,14 @@ wait_lwp (struct lwp_info *lp)
       /* Check if the thread has exited.  */
       if (WIFEXITED (status) || WIFSIGNALED (status))
 	{
-	  if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+	  if (report_thread_events
+	      || ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
 	    {
 	      if (debug_linux_nat)
-		fprintf_unfiltered (gdb_stdlog, "WL: Process %d exited.\n",
+		fprintf_unfiltered (gdb_stdlog, "WL: LWP %d exited.\n",
 				    ptid_get_pid (lp->ptid));
 
-	      /* This is the leader exiting, it means the whole
+	      /* If this is the leader exiting, it means the whole
 		 process is gone.  Store the status to report to the
 		 core.  Store it in lp->waitstatus, because lp->status
 		 would be ambiguous (W_EXITCODE(0,0) == 0).  */
@@ -2902,7 +2911,8 @@ linux_nat_filter_event (int lwpid, int status)
   /* Check if the thread has exited.  */
   if (WIFEXITED (status) || WIFSIGNALED (status))
     {
-      if (num_lwps (ptid_get_pid (lp->ptid)) > 1)
+      if (!report_thread_events
+	  && num_lwps (ptid_get_pid (lp->ptid)) > 1)
 	{
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog,
@@ -2922,10 +2932,10 @@ linux_nat_filter_event (int lwpid, int status)
 	 resumed.  */
       if (debug_linux_nat)
 	fprintf_unfiltered (gdb_stdlog,
-			    "Process %ld exited (resumed=%d)\n",
+			    "LWP %ld exited (resumed=%d)\n",
 			    ptid_get_lwp (lp->ptid), lp->resumed);
 
-      /* This was the last lwp in the process.  Since events are
+      /* This may be the last lwp in the process.  Since events are
 	 serialized to GDB core, we may not be able report this one
 	 right now, but GDB core and the other target layers will want
 	 to be notified about the exit code/signal, leave the status
@@ -3110,6 +3120,30 @@ check_zombie_leaders (void)
     }
 }
 
+/* Convenience function that is called when the kernel reports an exit
+   event.  This decides whether to report the event to GDB as a
+   process exit event, a thread exit event, or to suppress the
+   event.  */
+
+static ptid_t
+filter_exit_event (struct lwp_info *event_child,
+		   struct target_waitstatus *ourstatus)
+{
+  ptid_t ptid = event_child->ptid;
+
+  if (num_lwps (ptid_get_pid (ptid)) > 1)
+    {
+      if (report_thread_events)
+	ourstatus->kind = TARGET_WAITKIND_THREAD_EXITED;
+      else
+	ourstatus->kind = TARGET_WAITKIND_IGNORE;
+
+      exit_lwp (event_child);
+    }
+
+  return ptid;
+}
+
 static ptid_t
 linux_nat_wait_1 (struct target_ops *ops,
 		  ptid_t ptid, struct target_waitstatus *ourstatus,
@@ -3339,6 +3373,9 @@ linux_nat_wait_1 (struct target_ops *ops,
   else
     lp->core = linux_common_core_of_thread (lp->ptid);
 
+  if (ourstatus->kind == TARGET_WAITKIND_EXITED)
+    return filter_exit_event (lp, ourstatus);
+
   return lp->ptid;
 }
 
@@ -4614,6 +4651,14 @@ linux_nat_fileio_unlink (struct target_ops *self,
   return ret;
 }
 
+/* Implementation of the to_thread_events method.  */
+
+static void
+linux_nat_thread_events (struct target_ops *ops, int enable)
+{
+  report_thread_events = enable;
+}
+
 void
 linux_nat_add_target (struct target_ops *t)
 {
@@ -4646,6 +4691,7 @@ linux_nat_add_target (struct target_ops *t)
   t->to_supports_stopped_by_sw_breakpoint = linux_nat_supports_stopped_by_sw_breakpoint;
   t->to_stopped_by_hw_breakpoint = linux_nat_stopped_by_hw_breakpoint;
   t->to_supports_stopped_by_hw_breakpoint = linux_nat_supports_stopped_by_hw_breakpoint;
+  t->to_thread_events = linux_nat_thread_events;
 
   t->to_can_async_p = linux_nat_can_async_p;
   t->to_is_async_p = linux_nat_is_async_p;
diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index 844b05c..fe46062 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -1118,12 +1118,14 @@ thread_db_wait (struct target_ops *ops,
 
   ptid = beneath->to_wait (beneath, ptid, ourstatus, options);
 
-  if (ourstatus->kind == TARGET_WAITKIND_IGNORE)
-    return ptid;
-
-  if (ourstatus->kind == TARGET_WAITKIND_EXITED
-      || ourstatus->kind == TARGET_WAITKIND_SIGNALLED)
-    return ptid;
+  switch (ourstatus->kind)
+    {
+    case TARGET_WAITKIND_IGNORE:
+    case TARGET_WAITKIND_EXITED:
+    case TARGET_WAITKIND_THREAD_EXITED:
+    case TARGET_WAITKIND_SIGNALLED:
+      return ptid;
+    }
 
   info = get_thread_db_info (ptid_get_pid (ptid));
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/6] [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of /proc/PID/maps
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
  2016-05-19 14:48 ` [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run Pedro Alves
@ 2016-05-19 14:48 ` Pedro Alves
  2016-05-23 11:08   ` Yao Qi
  2016-05-19 14:48 ` [PATCH 1/6] Linux native thread create/exit events support Pedro Alves
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:48 UTC (permalink / raw)
  To: gdb-patches

... as it's _much_ faster.

Hacking the gdb.threads/attach-many-short-lived-threads.exp test to
spawn thousands of threads instead of dozens to stress and debug
timeout problems with gdb.threads/attach-many-short-lived-threads.exp,
I saw that GDB would spend several seconds just reading the
/proc/PID/smaps file, to determine the vDSO mapping range.  GDB opens
and reads the whole file just once, and caches the result, but even
that is too slow.  For example, with almost 8000 threads:

 $ ls /proc/3518/task/ | wc -l
 7906

reading the /proc/PID/smaps file grepping for "vdso" takes over 15
seconds :

 $ time cat /proc/3518/smaps | grep vdso
 7ffdbafee000-7ffdbaff0000 r-xp 00000000 00:00 0                          [vdso]

 real    0m15.371s
 user    0m0.008s
 sys     0m15.017s

Looking around the web for hints, I found a nice description of the
issue here:

 http://backtrace.io/blog/blog/2014/11/12/large-thread-counts-and-slow-process-maps/

The problem is that /proc/PID/smaps wants to show the mappings as
being thread stack, and that has the kernel iterating over all threads
in the thread group, for each mapping.

The fix is to use the "map" file under /proc/PID/task/PID/ instead of
the /proc/PID/ one, as the former doesn't mark thread stacks for all
threads.

That alone drops the timing to the millisecond range on my machine:

 $ time cat /proc/3518/task/3518/smaps | grep vdso
 7ffdbafee000-7ffdbaff0000 r-xp 00000000 00:00 0                          [vdso]

 real    0m0.150s
 user    0m0.009s
 sys     0m0.084s

And since we only need the vdso mapping's address range, we can use
"maps" file instead of "smaps", and it's even cheaper:

/proc/PID/task/PID/maps :

 $ time cat /proc/3518/task/3518/maps | grep vdso
 7ffdbafee000-7ffdbaff0000 r-xp 00000000 00:00 0                          [vdso]

 real    0m0.027s
 user    0m0.000s
 sys     0m0.017s

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-tdep.c (find_mapping_size): Delete.
	(linux_vsyscall_range_raw): Rewrite reading from
	/proc/PID/task/PID/maps directly instead of using
	gdbarch_find_memory_regions.
---
 gdb/linux-tdep.c | 77 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 23 deletions(-)

diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index b8d063f..ab110b0 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -2277,39 +2277,70 @@ linux_gdb_signal_to_target (struct gdbarch *gdbarch,
   return -1;
 }
 
-/* Rummage through mappings to find a mapping's size.  */
-
-static int
-find_mapping_size (CORE_ADDR vaddr, unsigned long size,
-		   int read, int write, int exec, int modified,
-		   void *data)
-{
-  struct mem_range *range = (struct mem_range *) data;
-
-  if (vaddr == range->start)
-    {
-      range->length = size;
-      return 1;
-    }
-  return 0;
-}
-
 /* Helper for linux_vsyscall_range that does the real work of finding
    the vsyscall's address range.  */
 
 static int
 linux_vsyscall_range_raw (struct gdbarch *gdbarch, struct mem_range *range)
 {
+  char filename[100];
+  long pid;
+  char *data;
+
+  /* Can't access /proc if debugging a core file.  */
+  if (!target_has_execution)
+    return 0;
+
+  /* We need to know the real target PID to access /proc.  */
+  if (current_inferior ()->fake_pid_p)
+    return 0;
+
   if (target_auxv_search (&current_target, AT_SYSINFO_EHDR, &range->start) <= 0)
     return 0;
 
-  /* This is installed by linux_init_abi below, so should always be
-     available.  */
-  gdb_assert (gdbarch_find_memory_regions_p (target_gdbarch ()));
+  pid = current_inferior ()->pid;
 
-  range->length = 0;
-  gdbarch_find_memory_regions (gdbarch, find_mapping_size, range);
-  return 1;
+  /* Note that reading /proc/PID/task/PID/maps (1) is much faster than
+     reading /proc/PID/maps (2).  The later identifies thread stacks
+     in the output, which requires scanning every thread in the thread
+     group to check whether a VMA is actually a thread's stack.  With
+     Linux 4.4 on an Intel i7-4810MQ @ 2.80GHz, with an inferior with
+     a few thousand threads, (1) takes a few miliseconds, while (2)
+     takes several seconds.  Also note that "smaps", what we read for
+     determining core dump mappings, is even slower than "maps".  */
+  xsnprintf (filename, sizeof filename, "/proc/%ld/task/%ld/maps", pid, pid);
+  data = target_fileio_read_stralloc (NULL, filename);
+  if (data != NULL)
+    {
+      struct cleanup *cleanup = make_cleanup (xfree, data);
+      char *line;
+      char *saveptr = NULL;
+
+      for (line = strtok_r (data, "\n", &saveptr);
+	   line != NULL;
+	   line = strtok_r (NULL, "\n", &saveptr))
+	{
+	  ULONGEST addr, endaddr;
+	  const char *p = line;
+
+	  addr = strtoulst (p, &p, 16);
+	  if (addr == range->start)
+	    {
+	      if (*p == '-')
+		p++;
+	      endaddr = strtoulst (p, &p, 16);
+	      range->length = endaddr - addr;
+	      do_cleanups (cleanup);
+	      return 1;
+	    }
+	}
+
+      do_cleanups (cleanup);
+    }
+  else
+    warning (_("unable to open /proc file '%s'"), filename);
+
+  return 0;
 }
 
 /* Implementation of the "vsyscall_range" gdbarch hook.  Handles
-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations
@ 2016-05-19 14:48 Pedro Alves
  2016-05-19 14:48 ` [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run Pedro Alves
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:48 UTC (permalink / raw)
  To: gdb-patches

The goal of this series is fix PR gdb/19828.  In this series, I put
the fix for PR gdb/19828 last, but it actually fixes the bug on its
own.  However, that fix makes GDB add LWPs to the thread list sooner
and that ends up changing code paths in the attach sequence and
exposing inneficiencies that in turn result in
gdb.threads/attach-many-short-lived-threads.exp timing out almost
always on my machine...

So the first four patches of the series are about preemptively fixing
those time outs.

To make it easier to debug these inneficiencies, I applied patch #6
first, and then amplified the problem by hacking the
gdb.threads/attach-many-short-lived-threads.exp test program to spawn
thousands (2k-10k) of threads instead of dozens.

Then I simply ran the program in one terminal, and had gdb attach to
it from another terminal.

Without the other fixes, GDB would just never manage to finish
attaching to all LWPs.  With the fixes, GDB takes a few seconds to
fully attach.

So I suspect that this series ends up fixing the
gdb.threads/attach-many-short-lived-threads.exp racy timeouts seen on
some of the (probably slower) build slaves of our buildbot setup.

I have some other ideas to try as follow up, that I'll likely end up
exploring at some point, but I'll stop here for now.

(E.g., try group-stopping the process with SIGSTOP to have the kernel
freeze the whole thread group, though I think it may turn ugly and I
kind of dislike it as we should instead be moving in the direction of
PTRACE_SEIZE / PTRACE_INTERRUPT.)

Fedora 23 is carrying a simpler GDB fix for PR gdb/19828 that avoids
all these optimizations, but at the expense of leaving "attach&"
(background attach) still broken.  That might be safer to put in
7.11.1, but then again I dislike the idea of leaving attach& broken.
I'll find and post that patch for you all to see what I mean.

Pedro Alves (6):
  Linux native thread create/exit events support
  [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of
    /proc/PID/maps
  [Linux] Avoid refetching core-of-thread if thread hasn't run
  [Linux] Optimize PID -> struct lwp_info lookup
  Make gdb/linux-nat.c consider a waitstatus pending on the infrun side
  Fix PR gdb/19828: gdb -p <process from a container>: internal error

 gdb/linux-nat.c                                   | 241 ++++++++++++++++++----
 gdb/linux-nat.h                                   |   4 +-
 gdb/linux-tdep.c                                  |  77 ++++---
 gdb/linux-thread-db.c                             |  14 +-
 gdb/testsuite/gdb.threads/clone-attach-detach.c   |  66 ++++++
 gdb/testsuite/gdb.threads/clone-attach-detach.exp |  98 +++++++++
 6 files changed, 428 insertions(+), 72 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/clone-attach-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/clone-attach-detach.exp

-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
@ 2016-05-19 14:48 ` Pedro Alves
  2016-05-23 12:45   ` Yao Qi
  2016-05-19 14:48 ` [PATCH 2/6] [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of /proc/PID/maps Pedro Alves
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:48 UTC (permalink / raw)
  To: gdb-patches

Hacking the gdb.threads/attach-many-short-lived-threads.exp test to
spawn thousands of threads instead of dozens, I saw GDB having trouble
keeping up with threads being spawned too fast, when it tried to stop
them all.  This was because while gdb is doing that, it updates the
thread list to make sure no new thread has sneaked in that might need
to be paused.  It does this a few times until it sees no-new-threads
twice in a row.  The thread listing update itself is not that
expensive, however, in the Linux backend, updating the threads list
calls linux_common_core_of_thread for each LWP to record on which core
each LWP was last seen running, which opens/reads/closes a /proc file
for each LWP which becomes expensive when you need to do it for
thousands of LWPs.

perf shows gdb in linux_common_core_of_thread 44% of the time, in the
stop_all_threads -> update_thread_list path in this use case.

This patch simply makes linux_common_core_of_thread avoid updating the
core the thread is bound to if the thread hasn't run since the last
time we updated that info.  This makes linux_common_core_of_thread
disappear into the noise in the perf report.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (linux_resume_one_lwp_throw): Clear the LWP's core
	field.
	(linux_nat_update_thread_list): Don't fetch the core if already
	known.
---
 gdb/linux-nat.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 5ec56c1..509212e 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1432,6 +1432,7 @@ linux_resume_one_lwp_throw (struct lwp_info *lp, int step,
      status.  Note that we must not throw after this is cleared,
      otherwise handle_zombie_lwp_error would get confused.  */
   lp->stopped = 0;
+  lp->core = -1;
   lp->stop_reason = TARGET_STOPPED_BY_NO_REASON;
   registers_changed_ptid (lp->ptid);
 }
@@ -3791,7 +3792,13 @@ linux_nat_update_thread_list (struct target_ops *ops)
   /* Update the processor core that each lwp/thread was last seen
      running on.  */
   ALL_LWPS (lwp)
-    lwp->core = linux_common_core_of_thread (lwp->ptid);
+    {
+      /* Avoid accessing /proc if the thread hasn't run since we last
+	 time we fetched the thread's core.  Accessing /proc becomes
+	 noticeably expensive when we have thousands of LWPs.  */
+      if (lwp->core == -1)
+	lwp->core = linux_common_core_of_thread (lwp->ptid);
+    }
 }
 
 static char *
-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/6] Make gdb/linux-nat.c consider a waitstatus pending on the infrun side
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
                   ` (2 preceding siblings ...)
  2016-05-19 14:48 ` [PATCH 1/6] Linux native thread create/exit events support Pedro Alves
@ 2016-05-19 14:56 ` Pedro Alves
  2016-05-19 14:57 ` [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error Pedro Alves
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:56 UTC (permalink / raw)
  To: gdb-patches

Working on the fix for gdb/19828, I saw
gdb.threads/attach-many-short-lived-threads.exp fail once in an
unusual way.  Unfortunately I didn't keep debug logs, but it's an
issue similar to what's been fixed in remote.c a while ago --
linux-nat.c was not fetching the pending status from the right place.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-nat.c (get_pending_status): If the thread reported the
	event to the core and it's pending, use the pending status signal
	number.
---
 gdb/linux-nat.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index ea2e4dc..f2bdddc 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1330,7 +1330,10 @@ get_pending_status (struct lwp_info *lp, int *status)
     {
       struct thread_info *tp = find_thread_ptid (lp->ptid);
 
-      signo = tp->suspend.stop_signal;
+      if (tp->suspend.waitstatus_pending_p)
+	signo = tp->suspend.waitstatus.value.sig;
+      else
+	signo = tp->suspend.stop_signal;
     }
   else if (!target_is_non_stop_p ())
     {
-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] [Linux] Optimize PID -> struct lwp_info lookup
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
                   ` (4 preceding siblings ...)
  2016-05-19 14:57 ` [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error Pedro Alves
@ 2016-05-19 14:57 ` Pedro Alves
  2016-05-23 13:12   ` Yao Qi
  2016-05-19 15:11 ` [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
  2016-05-24 13:57 ` Pedro Alves
  7 siblings, 1 reply; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:57 UTC (permalink / raw)
  To: gdb-patches

Hacking the gdb.threads/attach-many-short-lived-threads.exp test to
spawn thousands of threads instead of dozens, and running gdb under
perf, I saw that GDB was spending most of the time in find_lwp_pid:

   - captured_main
      - 93.61% catch_command_errors
         - 87.41% attach_command
            - 87.40% linux_nat_attach
               - 87.40% linux_proc_attach_tgid_threads
                  - 82.38% attach_proc_task_lwp_callback
                     - 81.01% find_lwp_pid
                          5.30% ptid_get_lwp
                        + 0.10% ptid_lwp_p
                     + 0.64% add_thread
                     + 0.26% set_running
                     + 0.24% set_executing
                       0.12% ptid_get_lwp
                     + 0.01% ptrace
                     + 0.01% add_lwp

attach_proc_task_lwp_callback is called once for each LWP that we
attach to, found by listing the /proc/PID/task/ directory.  In turn,
attach_proc_task_lwp_callback calls find_lwp_pid to check whether the
LWP we're about to try to attach to is already known.  Since
find_lwp_pid does a linear walk over the whole LWP list, this becomes
quadratic.  We do the /proc/PID/task/ listing until we get two
iterations in a row where we found no new threads.  So the second and
following times we walk the /proc/PID/task/ dir, we're going to take
an even worse find_lwp_pid hit.

Fix this by keeping a list of LWPs sorted by PID, so we can use binary
searches for lookup.

The linked list embedded in the LWP structure itself is kept, and made
a double-linked list, so that removals from that list are O(1).  An
earlier version of this patch got rid of this list altogether, but
that revealed hidden dependencies / assumptions on how the list is
sorted.  For example, killing a process and then waiting for all the
LWPs status using iterate_over_lwps only works as is because the
leader LWP is always last in the list.  So I thought it better to take
an incremental approach and make this patch concern itself _only_ with
the PID lookup optimization.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-nat.c (lwp_info_p): New typedef.
	(lwp_list_by_lwpid): New VEC.
	(lwp_list): Tweak comment.
	(lwp_list_add, lwp_list_remove): New functions.
	(purge_lwp_list): Rewrite, walk the sorted-by-lwpid list instead.
	(lp_lwpid_lessthan): New.
	(add_initial_lwp): Add lwp to sorted-by-lwpid list too.  Use
	lwp_list_add.
	(lwp_lwpid_cmp, find_lwp_by_lwpid_index): New functions.
	(delete_lwp): Use lwp_list_remove.  Remove lwp from
	sorted-by-lwpid list too.
	(find_lwp_pid): Search in the sorted-by-lwpid list.
	* linux-nat.h (struct lwp_info) <prev>: New field.
---
 gdb/linux-nat.c | 159 ++++++++++++++++++++++++++++++++++++++++++++------------
 gdb/linux-nat.h |   4 +-
 2 files changed, 128 insertions(+), 35 deletions(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 509212e..ea2e4dc 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -677,8 +677,46 @@ linux_child_set_syscall_catchpoint (struct target_ops *self,
   return 0;
 }
 
-/* List of known LWPs.  */
+typedef struct lwp_info *lwp_info_p;
+
+DEF_VEC_P(lwp_info_p);
+
+/* List of known LWPs, sorted by LWP PID.  This speeds up the common
+   case of mapping a PID returned from the kernel to our corresponding
+   lwp_info data structure.  */
+static VEC(lwp_info_p) *lwp_list_by_lwpid;
+
+/* Head of double-linked list of known LWPs.  Sorted by reverse
+   creation order.  This order is assumed in some cases.  E.g.,
+   reaping status after killing alls lwps of a process: the leader LWP
+   must be reaped last.  */
 struct lwp_info *lwp_list;
+
+/* Add LP to sorted-by-creation-order double-linked list.  */
+
+static void
+lwp_list_add (struct lwp_info *lp)
+{
+  lp->next = lwp_list;
+  if (lwp_list != NULL)
+    lwp_list->prev = lp;
+  lwp_list = lp;
+}
+
+/* Remove LP from sorted-by-creation-order double-linked list.  */
+
+static void
+lwp_list_remove (struct lwp_info *lp)
+{
+  /* Remove from sorted-by-creation-order list.  */
+  if (lp->next != NULL)
+    lp->next->prev = lp->prev;
+  if (lp->prev != NULL)
+    lp->prev->next = lp->next;
+  if (lp == lwp_list)
+    lwp_list = lp->next;
+}
+
 \f
 
 /* Original signal mask.  */
@@ -759,26 +797,36 @@ lwp_free (struct lwp_info *lp)
 static void
 purge_lwp_list (int pid)
 {
-  struct lwp_info *lp, *lpprev, *lpnext;
-
-  lpprev = NULL;
+  struct lwp_info **base = VEC_address (lwp_info_p, lwp_list_by_lwpid);
+  struct lwp_info **lp_out_p;
+  struct lwp_info *lp_in;
+  int ix;
 
-  for (lp = lwp_list; lp; lp = lpnext)
+  lp_out_p = base;
+  for (ix = 0; VEC_iterate (lwp_info_p, lwp_list_by_lwpid, ix, lp_in); ix++)
     {
-      lpnext = lp->next;
-
-      if (ptid_get_pid (lp->ptid) == pid)
+      if (ptid_get_pid (lp_in->ptid) == pid)
 	{
-	  if (lp == lwp_list)
-	    lwp_list = lp->next;
-	  else
-	    lpprev->next = lp->next;
-
-	  lwp_free (lp);
+	  lwp_list_remove (lp_in);
+	  lwp_free (lp_in);
 	}
       else
-	lpprev = lp;
+	{
+	  *lp_out_p = lp_in;
+	  lp_out_p++;
+	}
     }
+
+  VEC_truncate (lwp_info_p, lwp_list_by_lwpid, lp_out_p - base);
+}
+
+/* Predicate function which returns true if LHS should sort before RHS
+   in a list of memory regions, useful for VEC_lower_bound.  */
+
+static int
+lp_lwpid_lessthan (struct lwp_info *lhs, struct lwp_info *rhs)
+{
+  return ptid_get_lwp (lhs->ptid) < ptid_get_lwp (rhs->ptid);
 }
 
 /* Add the LWP specified by PTID to the list.  PTID is the first LWP
@@ -799,6 +847,7 @@ static struct lwp_info *
 add_initial_lwp (ptid_t ptid)
 {
   struct lwp_info *lp;
+  int ix;
 
   gdb_assert (ptid_lwp_p (ptid));
 
@@ -812,8 +861,13 @@ add_initial_lwp (ptid_t ptid)
   lp->ptid = ptid;
   lp->core = -1;
 
-  lp->next = lwp_list;
-  lwp_list = lp;
+  /* Add to sorted-by-creation-order list.  */
+  lwp_list_add (lp);
+
+  /* Add to sorted-by-lwpid list too.  */
+  ix = VEC_lower_bound (lwp_info_p, lwp_list_by_lwpid, lp,
+			lp_lwpid_lessthan);
+  VEC_safe_insert (lwp_info_p, lwp_list_by_lwpid, ix, lp);
 
   return lp;
 }
@@ -839,27 +893,65 @@ add_lwp (ptid_t ptid)
   return lp;
 }
 
+/* Bsearch comparison function for lwp_list_by_lwpid.  */
+
+static int
+lwp_lwpid_cmp (const void *key, const void *elt)
+{
+  const long lwpid_key = *(long *) key;
+  const struct lwp_info *lp = *(const struct lwp_info **) elt;
+  const long lwpid_lp = ptid_get_lwp (lp->ptid);
+
+  if (lwpid_key < lwpid_lp)
+    return -1;
+  if (lwpid_key > lwpid_lp)
+    return 1;
+  return 0;
+}
+
+/* Find an LWP in the sorted-by-lwpid LWP vector.  Returns vector
+   index if found, or -1 if not found.  */
+
+static int
+find_lwp_by_lwpid_index (long lwpid)
+{
+  struct lwp_info **found_p;
+  struct lwp_info **base;
+  size_t nmemb;
+  size_t size;
+
+  base = VEC_address (lwp_info_p, lwp_list_by_lwpid);
+  nmemb = VEC_length (lwp_info_p, lwp_list_by_lwpid);
+  size = sizeof (struct lwp_info **);
+  found_p = (struct lwp_info **) bsearch (&lwpid, base, nmemb,
+				       size, lwp_lwpid_cmp);
+  if (found_p != NULL)
+    return found_p - base;
+
+  return -1;
+}
+
 /* Remove the LWP specified by PID from the list.  */
 
 static void
 delete_lwp (ptid_t ptid)
 {
-  struct lwp_info *lp, *lpprev;
+  struct lwp_info *lp;
+  int ix;
 
-  lpprev = NULL;
+  ix = find_lwp_by_lwpid_index (ptid_get_lwp (ptid));
+  if (ix == -1)
+    return;
 
-  for (lp = lwp_list; lp; lpprev = lp, lp = lp->next)
-    if (ptid_equal (lp->ptid, ptid))
-      break;
+  lp = VEC_index (lwp_info_p, lwp_list_by_lwpid, ix);
 
-  if (!lp)
-    return;
+  /* Remove from sorted-by-lwpid list.  */
+  VEC_ordered_remove (lwp_info_p, lwp_list_by_lwpid, ix);
 
-  if (lpprev)
-    lpprev->next = lp->next;
-  else
-    lwp_list = lp->next;
+  /* Remove from sorted-by-creation-order list.  */
+  lwp_list_remove (lp);
 
+  /* Release.  */
   lwp_free (lp);
 }
 
@@ -870,18 +962,17 @@ static struct lwp_info *
 find_lwp_pid (ptid_t ptid)
 {
   struct lwp_info *lp;
-  int lwp;
+  int lwp, ix;
 
   if (ptid_lwp_p (ptid))
     lwp = ptid_get_lwp (ptid);
   else
     lwp = ptid_get_pid (ptid);
 
-  for (lp = lwp_list; lp; lp = lp->next)
-    if (lwp == ptid_get_lwp (lp->ptid))
-      return lp;
-
-  return NULL;
+  ix = find_lwp_by_lwpid_index (lwp);
+  if (ix == -1)
+    return NULL;
+  return VEC_index (lwp_info_p, lwp_list_by_lwpid, ix);
 }
 
 /* See nat/linux-nat.h.  */
diff --git a/gdb/linux-nat.h b/gdb/linux-nat.h
index 73888e5..94fc5ed 100644
--- a/gdb/linux-nat.h
+++ b/gdb/linux-nat.h
@@ -101,7 +101,9 @@ struct lwp_info
   /* Arch-specific additions.  */
   struct arch_lwp_info *arch_private;
 
-  /* Next LWP in list.  */
+  /* Previous and next pointers in double-linked list of known LWPs,
+     sorted by reverse creation order.  */
+  struct lwp_info *prev;
   struct lwp_info *next;
 };
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
                   ` (3 preceding siblings ...)
  2016-05-19 14:56 ` [PATCH 5/6] Make gdb/linux-nat.c consider a waitstatus pending on the infrun side Pedro Alves
@ 2016-05-19 14:57 ` Pedro Alves
  2016-05-23 10:14   ` Yao Qi
  2016-05-25 17:43   ` [pushed/7.11.1] " Pedro Alves
  2016-05-19 14:57 ` [PATCH 4/6] [Linux] Optimize PID -> struct lwp_info lookup Pedro Alves
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 14:57 UTC (permalink / raw)
  To: gdb-patches

When GDB attaches to a process, it looks at the /proc/PID/task/ dir
for all clone threads of that process, and attaches to each of them.

Usually, if there is more than one clone thread, it means the program
is multi threaded and linked with pthreads.  Thus when GDB soon after
attaching finds and loads a libthread_db matching the process, it'll
add a thread to the thread list for each of the initially found
lower-level LWPs.

If, however, GDB fails to find/load a matching libthread_db, nothing
is adding the LWPs to the thread list.  And because of that, "detach"
hits an internal error:

  (gdb) PASS: gdb.threads/clone-attach-detach.exp: fg attach 1: attach
  info threads
    Id   Target Id         Frame
  * 1    LWP 6891 "clone-attach-de" 0x00007f87e5fd0790 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
  (gdb) FAIL: gdb.threads/clone-attach-detach.exp: fg attach 1: info threads shows two LWPs
  detach
  .../src/gdb/thread.c:1010: internal-error: is_executing: Assertion `tp' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n)
  FAIL: gdb.threads/clone-attach-detach.exp: fg attach 1: detach (GDB internal error)

From here:

  ...
  #8  0x00000000007ba7cc in internal_error (file=0x98ea68 ".../src/gdb/thread.c", line=1010, fmt=0x98ea30 "%s: Assertion `%s' failed.")
      at .../src/gdb/common/errors.c:55
  #9  0x000000000064bb83 in is_executing (ptid=...) at .../src/gdb/thread.c:1010
  #10 0x00000000004c23bb in get_pending_status (lp=0x12c5cc0, status=0x7fffffffdc0c) at .../src/gdb/linux-nat.c:1235
  #11 0x00000000004c2738 in detach_callback (lp=0x12c5cc0, data=0x0) at .../src/gdb/linux-nat.c:1317
  #12 0x00000000004c1a2a in iterate_over_lwps (filter=..., callback=0x4c2599 <detach_callback>, data=0x0) at .../src/gdb/linux-nat.c:899
  #13 0x00000000004c295c in linux_nat_detach (ops=0xe7bd30, args=0x0, from_tty=1) at .../src/gdb/linux-nat.c:1358
  #14 0x000000000068284d in delegate_detach (self=0xe7bd30, arg1=0x0, arg2=1) at .../src/gdb/target-delegates.c:34
  #15 0x0000000000694141 in target_detach (args=0x0, from_tty=1) at .../src/gdb/target.c:2241
  #16 0x0000000000630582 in detach_command (args=0x0, from_tty=1) at .../src/gdb/infcmd.c:2975
  ...

Tested on x86-64 Fedora 23.  Also confirmed the test passes against
gdbserver with "maint set target-non-stop".

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-nat.c (attach_proc_task_lwp_callback): Mark the lwp
	resumed, and add the thread to GDB's thread list.

testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* gdb.threads/clone-attach-detach.c: New file.
	* gdb.threads/clone-attach-detach.exp: New file.
---
 gdb/linux-nat.c                                   | 10 +++
 gdb/testsuite/gdb.threads/clone-attach-detach.c   | 66 +++++++++++++++
 gdb/testsuite/gdb.threads/clone-attach-detach.exp | 98 +++++++++++++++++++++++
 3 files changed, 174 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/clone-attach-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/clone-attach-detach.exp

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index f2bdddc..ee61e70 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1182,6 +1182,16 @@ attach_proc_task_lwp_callback (ptid_t ptid)
 	  /* We need to wait for a stop before being able to make the
 	     next ptrace call on this LWP.  */
 	  lp->must_set_ptrace_flags = 1;
+
+	  /* So that wait collects the SIGSTOP.  */
+	  lp->resumed = 1;
+
+	  /* Also add the LWP to gdb's thread list, in case a
+	     matching libthread_db is not found (or the process uses
+	     raw clone).  */
+	  add_thread (lp->ptid);
+	  set_running (lp->ptid, 1);
+	  set_executing (lp->ptid, 1);
 	}
 
       return 1;
diff --git a/gdb/testsuite/gdb.threads/clone-attach-detach.c b/gdb/testsuite/gdb.threads/clone-attach-detach.c
new file mode 100644
index 0000000..daa8f1f
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/clone-attach-detach.c
@@ -0,0 +1,66 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#define STACK_SIZE 0x1000
+
+int clone_pid;
+
+static int
+clone_fn (void *unused)
+{
+  /* Wait for alarm.  */
+  while (1)
+    sleep (1);
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  unsigned char *stack;
+  int res;
+
+  alarm (300);
+
+  stack = malloc (STACK_SIZE);
+  assert (stack != NULL);
+
+#define CLONE_FLAGS (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)
+
+#ifdef __ia64__
+  clone_pid = __clone2 (clone_fn, stack, STACK_SIZE, CLONE_FLAGS, NULL);
+#else
+  clone_pid = clone (clone_fn, stack + STACK_SIZE, CLONE_FLAGS, NULL);
+#endif
+
+  assert (clone_pid > 0);
+
+  /* Wait for alarm.  */
+  while (1)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/clone-attach-detach.exp b/gdb/testsuite/gdb.threads/clone-attach-detach.exp
new file mode 100644
index 0000000..42e9914
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/clone-attach-detach.exp
@@ -0,0 +1,98 @@
+# This testcase is part of GDB, the GNU debugger.
+
+# Copyright 2016 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test attach / detach from a process that uses raw clone.  We use raw
+# clone as proxy for when libthread_db is not available.
+
+# This only works on targets with the Linux kernel.
+if ![istarget *-*-linux*] {
+    return
+}
+
+if {![can_spawn_for_attach]} {
+    return 0
+}
+
+standard_testfile
+
+if [prepare_for_testing "failed to prepare" $testfile $srcfile {debug}] {
+    return -1
+}
+
+clean_restart ${binfile}
+
+set test_spawn_id [spawn_wait_for_attach $binfile]
+set testpid [spawn_id_get_pid $test_spawn_id]
+
+# Native/gdbserver.
+set thread_re "(LWP $decimal|Thread )"
+
+# Try attach / detach a few times, in case GDB ends up with stale
+# state after detaching.
+
+set attempts 3
+for {set attempt 1} {$attempt <= $attempts} {incr attempt} {
+    with_test_prefix "fg attach $attempt" {
+
+	gdb_test "attach $testpid" \
+	    "Attaching to program.*process $testpid.*" \
+	    "attach"
+
+	gdb_test "info threads" \
+	    "1.*${thread_re}.*\r\n.*2.*${thread_re}.*" \
+	    "info threads shows two LWPs"
+
+	gdb_test "detach" "Detaching from .*, process $testpid"
+    }
+}
+
+# Same, but use async/background attach.
+
+# If debugging with target remote, check whether the all-stop variant
+# of the RSP is being used.  If so, we can't run the background tests.
+if {[target_info exists gdb_protocol]
+    && ([target_info gdb_protocol] == "remote"
+	|| [target_info gdb_protocol] == "extended-remote")} {
+
+    set test "maint show target-non-stop"
+    gdb_test_multiple "maint show target-non-stop" $test {
+	-re "(is|currently) on.*$gdb_prompt $" {
+	}
+	-re "(is|currently) off.*$gdb_prompt $" {
+	    unsupported "bg attach: can't issue info threads while target is running"
+	    return 0
+	}
+    }
+}
+
+set attempts 3
+for {set attempt 1} {$attempt <= $attempts} {incr attempt} {
+    with_test_prefix "bg attach $attempt" {
+
+	gdb_test "attach $testpid &" \
+	    "Attaching to program.*process $testpid.*" \
+	    "attach"
+
+	gdb_test "info threads" \
+	    "1.*${thread_re}.*\\(running\\)\r\n.*2.*${thread_re}.*\\(running\\)" \
+	    "info threads shows two LWPs"
+
+	gdb_test "detach" "Detaching from .*, process $testpid"
+    }
+}
+
+kill_wait_spawned_process $test_spawn_id
-- 
2.5.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
                   ` (5 preceding siblings ...)
  2016-05-19 14:57 ` [PATCH 4/6] [Linux] Optimize PID -> struct lwp_info lookup Pedro Alves
@ 2016-05-19 15:11 ` Pedro Alves
  2016-05-24 13:57 ` Pedro Alves
  7 siblings, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-19 15:11 UTC (permalink / raw)
  To: gdb-patches

On 05/19/2016 03:48 PM, Pedro Alves wrote:
> 
> Fedora 23 is carrying a simpler GDB fix for PR gdb/19828 that avoids
> all these optimizations, but at the expense of leaving "attach&"
> (background attach) still broken.  That might be safer to put in
> 7.11.1, but then again I dislike the idea of leaving attach& broken.
> I'll find and post that patch for you all to see what I mean.

I've posted it here now:

 https://sourceware.org/ml/gdb-patches/2016-05/msg00335.html

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error
  2016-05-19 14:57 ` [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error Pedro Alves
@ 2016-05-23 10:14   ` Yao Qi
  2016-05-25 17:43   ` [pushed/7.11.1] " Pedro Alves
  1 sibling, 0 replies; 19+ messages in thread
From: Yao Qi @ 2016-05-23 10:14 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> gdb/ChangeLog:
> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>
> 	PR gdb/19828
> 	* linux-nat.c (attach_proc_task_lwp_callback): Mark the lwp
> 	resumed, and add the thread to GDB's thread list.
>
> testsuite/ChangeLog:
> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>
> 	PR gdb/19828
> 	* gdb.threads/clone-attach-detach.c: New file.
> 	* gdb.threads/clone-attach-detach.exp: New file.

Patch is good to me.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Linux native thread create/exit events support
  2016-05-19 14:48 ` [PATCH 1/6] Linux native thread create/exit events support Pedro Alves
@ 2016-05-23 10:49   ` Yao Qi
  2016-05-23 13:00     ` Pedro Alves
  0 siblings, 1 reply; 19+ messages in thread
From: Yao Qi @ 2016-05-23 10:49 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

Patch is good to me, one question below,

> -      /* This was the last lwp in the process.  Since events are
> +      /* This may be the last lwp in the process.  Since events are
>  	 serialized to GDB core, we may not be able report this one
>  	 right now, but GDB core and the other target layers will want
>  	 to be notified about the exit code/signal, leave the status

Is this paragraph of comment still valid?

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of /proc/PID/maps
  2016-05-19 14:48 ` [PATCH 2/6] [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of /proc/PID/maps Pedro Alves
@ 2016-05-23 11:08   ` Yao Qi
  0 siblings, 0 replies; 19+ messages in thread
From: Yao Qi @ 2016-05-23 11:08 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> gdb/ChangeLog:
> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>
> 	PR gdb/19828
> 	* linux-tdep.c (find_mapping_size): Delete.
> 	(linux_vsyscall_range_raw): Rewrite reading from
> 	/proc/PID/task/PID/maps directly instead of using
> 	gdbarch_find_memory_regions.

Looks good to me.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run
  2016-05-19 14:48 ` [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run Pedro Alves
@ 2016-05-23 12:45   ` Yao Qi
  0 siblings, 0 replies; 19+ messages in thread
From: Yao Qi @ 2016-05-23 12:45 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> gdb/ChangeLog:
> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>
> 	* linux-nat.c (linux_resume_one_lwp_throw): Clear the LWP's core
> 	field.
> 	(linux_nat_update_thread_list): Don't fetch the core if already
> 	known.

Looks good to me.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Linux native thread create/exit events support
  2016-05-23 10:49   ` Yao Qi
@ 2016-05-23 13:00     ` Pedro Alves
  0 siblings, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-23 13:00 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 05/23/2016 11:48 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Patch is good to me, one question below,
> 
>> -      /* This was the last lwp in the process.  Since events are
>> +      /* This may be the last lwp in the process.  Since events are
>>  	 serialized to GDB core, we may not be able report this one
>>  	 right now, but GDB core and the other target layers will want
>>  	 to be notified about the exit code/signal, leave the status
> 
> Is this paragraph of comment still valid?

Indeed, doesn't look like it adds much now.  I'll remove it.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] [Linux] Optimize PID -> struct lwp_info lookup
  2016-05-19 14:57 ` [PATCH 4/6] [Linux] Optimize PID -> struct lwp_info lookup Pedro Alves
@ 2016-05-23 13:12   ` Yao Qi
  2016-05-23 18:10     ` [PATCH v2/htab " Pedro Alves
  0 siblings, 1 reply; 19+ messages in thread
From: Yao Qi @ 2016-05-23 13:12 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

Pedro Alves <palves@redhat.com> writes:

> Fix this by keeping a list of LWPs sorted by PID, so we can use binary
> searches for lookup.

Is it better to use htab?

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2/htab 4/6] [Linux] Optimize PID -> struct lwp_info lookup
  2016-05-23 13:12   ` Yao Qi
@ 2016-05-23 18:10     ` Pedro Alves
  2016-05-24  9:33       ` Yao Qi
  0 siblings, 1 reply; 19+ messages in thread
From: Pedro Alves @ 2016-05-23 18:10 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 05/23/2016 02:12 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> Fix this by keeping a list of LWPs sorted by PID, so we can use binary
>> searches for lookup.
> 
> Is it better to use htab?
> 

I had assumed the usage pattern would be clearly in favor of
sorted vector, but I tried a htab now, and at least for sets of
threads of the sizes I was looking at (2k -> 10k), numbers show
otherwise.

attach-many-short-lived-threads has threads come and go
too quickly for it to be easy to tell any difference -- the
number of threads running at any given time isn't very stable.

So I hacked the test further to make it spawn threads, and
then have the threads just sleep.  This provides a different
scenario, one where we always iterate over the /proc/PID/task/
lwp list exactly twice, so may give different results, but
it probably doesn't really matter -- the difference is probably
not that significant.

So, with a process that spawns many threads and just
has them all sleep, I timed attaching + detach with:

 $ time gdb -iex "set print thread-events off" --batch -q -p PID

I ran that 3 times for 4000 threads and 8000 threads, with
either using htab, and with a sorted vec + bsearch.  The results
were:

8000 threads, htab:

 | run  | 1         | 2         | 3         |
 |------+-----------+-----------+-----------|
 | real | 0m29.831s | 0m31.483s | 0m30.333s |
 | user | 0m20.548s | 0m22.124s | 0m21.143s |
 | sys  | 0m9.291s  | 0m9.366s  | 0m9.197s  |

8000 threads, sorted vec:

 | run  | 1         | 2         | 3         |
 |------+-----------+-----------+-----------|
 | real | 0m33.569s | 0m32.178s | 0m32.480s |
 | user | 0m24.027s | 0m22.981s | 0m23.344s |
 | sys  | 0m9.549s  | 0m9.207s  | 0m9.146s  |

4000 threads, htab:

 | run  | 1        | 2        | 3        |
 |------+----------+----------+----------|
 | real | 0m6.710s | 0m6.683s | 0m6.891s |
 | user | 0m4.561s | 0m4.525s | 0m4.593s |
 | sys  | 0m2.159s | 0m2.168s | 0m2.307s |

4000 threads, sorted vec:

 | run  | 1        | 2        | 3        |
 |------+----------+----------+----------|
 | real | 0m7.282s | 0m7.375s | 0m7.282s |
 | user | 0m5.132s | 0m5.173s | 0m5.080s |
 | sys  | 0m2.162s | 0m2.210s | 0m2.211s |

So given these numbers, and taking the "default to hash tab
unless you have a good reason not to" principle, I agree that's
it's better.

So here's v2, using a htab.

From f065eca4c841eddd2f917c5d2702c525a0065338 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 23 May 2016 14:38:58 +0100
Subject: [PATCH v2] [Linux] Optimize PID -> struct lwp_info lookup

Hacking the gdb.threads/attach-many-short-lived-threads.exp test to
spawn thousands of threads instead of dozens, and running gdb under
perf, I saw that GDB was spending most of the time in find_lwp_pid:

   - captured_main
      - 93.61% catch_command_errors
         - 87.41% attach_command
            - 87.40% linux_nat_attach
               - 87.40% linux_proc_attach_tgid_threads
                  - 82.38% attach_proc_task_lwp_callback
                     - 81.01% find_lwp_pid
                          5.30% ptid_get_lwp
                        + 0.10% ptid_lwp_p
                     + 0.64% add_thread
                     + 0.26% set_running
                     + 0.24% set_executing
                       0.12% ptid_get_lwp
                     + 0.01% ptrace
                     + 0.01% add_lwp

attach_proc_task_lwp_callback is called once for each LWP that we
attach to, found by listing the /proc/PID/task/ directory.  In turn,
attach_proc_task_lwp_callback calls find_lwp_pid to check whether the
LWP we're about to try to attach to is already known.  Since
find_lwp_pid does a linear walk over the whole LWP list, this becomes
quadratic.  We do the /proc/PID/task/ listing until we get two
iterations in a row where we found no new threads.  So the second and
following times we walk the /proc/PID/task/ dir, we're going to take
an even worse find_lwp_pid hit.

Fix this by adding a hash table keyed by LWP PID, for fast lookup.

The linked list embedded in the LWP structure itself is kept, and made
a double-linked list, so that removals from that list are O(1).  An
earlier version of this patch got rid of this list altogether, but
that revealed hidden dependencies / assumptions on how the list is
sorted.  For example, killing a process and then waiting for all the
LWPs status using iterate_over_lwps only works as is because the
leader LWP is always last in the list.  So I thought it better to take
an incremental approach and make this patch concern itself _only_ with
the PID lookup optimization.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-nat.c (lwp_lwpid_htab): New htab.
	(lwp_info_hash, lwp_lwpid_htab_eq, lwp_lwpid_htab_create)
	(lwp_lwpid_htab_add_lwp): New functions.
	(lwp_list): Tweak comment.
	(lwp_list_add, lwp_list_remove, lwp_lwpid_htab_remove_pid): New
	functions.
	(purge_lwp_list): Rewrite, using htab_traverse_noresize.
	(add_initial_lwp): Add lwp to htab too.  Use lwp_list_add.
	(delete_lwp): Use lwp_list_remove.  Remove htab too.
	(find_lwp_pid): Search in htab.
	(_initialize_linux_nat): Call lwp_lwpid_htab_create.
	* linux-nat.h (struct lwp_info) <prev>: New field.
---
 gdb/linux-nat.c | 158 ++++++++++++++++++++++++++++++++++++++++++--------------
 gdb/linux-nat.h |   4 +-
 2 files changed, 123 insertions(+), 39 deletions(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 7723d1e..7e5f4b5 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -677,8 +677,85 @@ linux_child_set_syscall_catchpoint (struct target_ops *self,
   return 0;
 }
 
-/* List of known LWPs.  */
+/* List of known LWPs, keyed by LWP PID.  This speeds up the common
+   case of mapping a PID returned from the kernel to our corresponding
+   lwp_info data structure.  */
+static htab_t lwp_lwpid_htab;
+
+/* Calculate a hash from a lwp_info's LWP PID.  */
+
+static hashval_t
+lwp_info_hash (const void *ap)
+{
+  const struct lwp_info *lp = (struct lwp_info *) ap;
+  pid_t pid = ptid_get_lwp (lp->ptid);
+
+  return iterative_hash_object (pid, 0);
+}
+
+/* Equality function for the lwp_info hash table.  Compares the LWP's
+   PID.  */
+
+static int
+lwp_lwpid_htab_eq (const void *a, const void *b)
+{
+  const struct lwp_info *entry = (const struct lwp_info *) a;
+  const struct lwp_info *element = (const struct lwp_info *) b;
+
+  return ptid_get_lwp (entry->ptid) == ptid_get_lwp (element->ptid);
+}
+
+/* Create the lwp_lwpid_htab hash table.  */
+
+static void
+lwp_lwpid_htab_create (void)
+{
+  lwp_lwpid_htab = htab_create (100, lwp_info_hash, lwp_lwpid_htab_eq, NULL);
+}
+
+/* Add LP to the hash table.  */
+
+static void
+lwp_lwpid_htab_add_lwp (struct lwp_info *lp)
+{
+  void **slot;
+
+  slot = htab_find_slot (lwp_lwpid_htab, lp, INSERT);
+  gdb_assert (slot != NULL && *slot == NULL);
+  *slot = lp;
+}
+
+/* Head of double-linked list of known LWPs.  Sorted by reverse
+   creation order.  This order is assumed in some cases.  E.g.,
+   reaping status after killing alls lwps of a process: the leader LWP
+   must be reaped last.  */
 struct lwp_info *lwp_list;
+
+/* Add LP to sorted-by-creation-order double-linked list.  */
+
+static void
+lwp_list_add (struct lwp_info *lp)
+{
+  lp->next = lwp_list;
+  if (lwp_list != NULL)
+    lwp_list->prev = lp;
+  lwp_list = lp;
+}
+
+/* Remove LP from sorted-by-creation-order double-linked list.  */
+
+static void
+lwp_list_remove (struct lwp_info *lp)
+{
+  /* Remove from sorted-by-creation-order list.  */
+  if (lp->next != NULL)
+    lp->next->prev = lp->prev;
+  if (lp->prev != NULL)
+    lp->prev->next = lp->next;
+  if (lp == lwp_list)
+    lwp_list = lp->next;
+}
+
 \f
 
 /* Original signal mask.  */
@@ -754,31 +831,30 @@ lwp_free (struct lwp_info *lp)
   xfree (lp);
 }
 
-/* Remove all LWPs belong to PID from the lwp list.  */
+/* Traversal function for purge_lwp_list.  */
 
-static void
-purge_lwp_list (int pid)
+static int
+lwp_lwpid_htab_remove_pid (void **slot, void *info)
 {
-  struct lwp_info *lp, *lpprev, *lpnext;
-
-  lpprev = NULL;
+  struct lwp_info *lp = (struct lwp_info *) *slot;
+  int pid = *(int *) info;
 
-  for (lp = lwp_list; lp; lp = lpnext)
+  if (ptid_get_pid (lp->ptid) == pid)
     {
-      lpnext = lp->next;
+      htab_clear_slot (lwp_lwpid_htab, slot);
+      lwp_list_remove (lp);
+      lwp_free (lp);
+    }
 
-      if (ptid_get_pid (lp->ptid) == pid)
-	{
-	  if (lp == lwp_list)
-	    lwp_list = lp->next;
-	  else
-	    lpprev->next = lp->next;
+  return 1;
+}
 
-	  lwp_free (lp);
-	}
-      else
-	lpprev = lp;
-    }
+/* Remove all LWPs belong to PID from the lwp list.  */
+
+static void
+purge_lwp_list (int pid)
+{
+  htab_traverse_noresize (lwp_lwpid_htab, lwp_lwpid_htab_remove_pid, &pid);
 }
 
 /* Add the LWP specified by PTID to the list.  PTID is the first LWP
@@ -812,8 +888,11 @@ add_initial_lwp (ptid_t ptid)
   lp->ptid = ptid;
   lp->core = -1;
 
-  lp->next = lwp_list;
-  lwp_list = lp;
+  /* Add to sorted-by-creation-order list.  */
+  lwp_list_add (lp);
+
+  /* Add to keyed-by-pid htab.  */
+  lwp_lwpid_htab_add_lwp (lp);
 
   return lp;
 }
@@ -844,22 +923,24 @@ add_lwp (ptid_t ptid)
 static void
 delete_lwp (ptid_t ptid)
 {
-  struct lwp_info *lp, *lpprev;
+  struct lwp_info *lp;
+  void **slot;
+  struct lwp_info dummy;
 
-  lpprev = NULL;
+  dummy.ptid = ptid;
+  slot = htab_find_slot (lwp_lwpid_htab, &dummy, NO_INSERT);
+  if (slot == NULL)
+    return;
 
-  for (lp = lwp_list; lp; lpprev = lp, lp = lp->next)
-    if (ptid_equal (lp->ptid, ptid))
-      break;
+  lp = *(struct lwp_info **) slot;
+  gdb_assert (lp != NULL);
 
-  if (!lp)
-    return;
+  htab_clear_slot (lwp_lwpid_htab, slot);
 
-  if (lpprev)
-    lpprev->next = lp->next;
-  else
-    lwp_list = lp->next;
+  /* Remove from sorted-by-creation-order list.  */
+  lwp_list_remove (lp);
 
+  /* Release.  */
   lwp_free (lp);
 }
 
@@ -871,17 +952,16 @@ find_lwp_pid (ptid_t ptid)
 {
   struct lwp_info *lp;
   int lwp;
+  struct lwp_info dummy;
 
   if (ptid_lwp_p (ptid))
     lwp = ptid_get_lwp (ptid);
   else
     lwp = ptid_get_pid (ptid);
 
-  for (lp = lwp_list; lp; lp = lp->next)
-    if (lwp == ptid_get_lwp (lp->ptid))
-      return lp;
-
-  return NULL;
+  dummy.ptid = ptid_build (0, lwp, 0);
+  lp = (struct lwp_info *) htab_find (lwp_lwpid_htab, &dummy);
+  return lp;
 }
 
 /* See nat/linux-nat.h.  */
@@ -4874,6 +4954,8 @@ Enables printf debugging output."),
   sigdelset (&suspend_mask, SIGCHLD);
 
   sigemptyset (&blocked_mask);
+
+  lwp_lwpid_htab_create ();
 }
 \f
 
diff --git a/gdb/linux-nat.h b/gdb/linux-nat.h
index 73888e5..94fc5ed 100644
--- a/gdb/linux-nat.h
+++ b/gdb/linux-nat.h
@@ -101,7 +101,9 @@ struct lwp_info
   /* Arch-specific additions.  */
   struct arch_lwp_info *arch_private;
 
-  /* Next LWP in list.  */
+  /* Previous and next pointers in double-linked list of known LWPs,
+     sorted by reverse creation order.  */
+  struct lwp_info *prev;
   struct lwp_info *next;
 };
 
-- 
2.5.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2/htab 4/6] [Linux] Optimize PID -> struct lwp_info lookup
  2016-05-23 18:10     ` [PATCH v2/htab " Pedro Alves
@ 2016-05-24  9:33       ` Yao Qi
  2016-05-24 13:47         ` Pedro Alves
  0 siblings, 1 reply; 19+ messages in thread
From: Yao Qi @ 2016-05-24  9:33 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Yao Qi, gdb-patches

Pedro Alves <palves@redhat.com> writes:

Hi Pedro,
Patch is good to me, a nit on comments,

> +/* Head of double-linked list of known LWPs.  Sorted by reverse
> +   creation order.  This order is assumed in some cases.  E.g.,
> +   reaping status after killing alls lwps of a process: the leader LWP
> +   must be reaped last.  */
>  struct lwp_info *lwp_list;

> +
> +/* Add LP to sorted-by-creation-order double-linked list.  */
> +

To reflect the code,
s/sorted-by-creation-order/sorted-by-reverse-creation-order/

> +
> +/* Remove LP from sorted-by-creation-order double-linked list.  */
> +

Likewise.

>  
> -  /* Next LWP in list.  */
> +  /* Previous and next pointers in double-linked list of known LWPs,
> +     sorted by reverse creation order.  */
> +  struct lwp_info *prev;

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2/htab 4/6] [Linux] Optimize PID -> struct lwp_info lookup
  2016-05-24  9:33       ` Yao Qi
@ 2016-05-24 13:47         ` Pedro Alves
  0 siblings, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-24 13:47 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On 05/24/2016 10:33 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Hi Pedro,
> Patch is good to me, a nit on comments,
> 

Thanks, I fixed them.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations
  2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
                   ` (6 preceding siblings ...)
  2016-05-19 15:11 ` [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
@ 2016-05-24 13:57 ` Pedro Alves
  7 siblings, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-24 13:57 UTC (permalink / raw)
  To: gdb-patches

I've applied this to master now.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pushed/7.11.1] [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error
  2016-05-19 14:57 ` [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error Pedro Alves
  2016-05-23 10:14   ` Yao Qi
@ 2016-05-25 17:43   ` Pedro Alves
  1 sibling, 0 replies; 19+ messages in thread
From: Pedro Alves @ 2016-05-25 17:43 UTC (permalink / raw)
  To: gdb-patches

As discussed at :

  https://sourceware.org/ml/gdb-patches/2016-05/msg00449.html

Here's the version that I pushed on the 7.11 branch.  It's the exact
same code (thus fixes attach& as well), but it disables the
attach-many-short-lived.exp threads.

From 136613ef0c6850427317e57be1b644080ff6decb Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 25 May 2016 18:35:09 +0100
Subject: [PATCH] Fix PR gdb/19828: gdb -p <process from a container>: internal
 error

When GDB attaches to a process, it looks at the /proc/PID/task/ dir
for all clone threads of that process, and attaches to each of them.

Usually, if there is more than one clone thread, it means the program
is multi threaded and linked with pthreads.  Thus when GDB soon after
attaching finds and loads a libthread_db matching the process, it'll
add a thread to the thread list for each of the initially found
lower-level LWPs.

If, however, GDB fails to find/load a matching libthread_db, nothing
is adding the LWPs to the thread list.  And because of that, "detach"
hits an internal error:

  (gdb) PASS: gdb.threads/clone-attach-detach.exp: fg attach 1: attach
  info threads
    Id   Target Id         Frame
  * 1    LWP 6891 "clone-attach-de" 0x00007f87e5fd0790 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
  (gdb) FAIL: gdb.threads/clone-attach-detach.exp: fg attach 1: info threads shows two LWPs
  detach
  .../src/gdb/thread.c:1010: internal-error: is_executing: Assertion `tp' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n)
  FAIL: gdb.threads/clone-attach-detach.exp: fg attach 1: detach (GDB internal error)

From here:

  ...
  #8  0x00000000007ba7cc in internal_error (file=0x98ea68 ".../src/gdb/thread.c", line=1010, fmt=0x98ea30 "%s: Assertion `%s' failed.")
      at .../src/gdb/common/errors.c:55
  #9  0x000000000064bb83 in is_executing (ptid=...) at .../src/gdb/thread.c:1010
  #10 0x00000000004c23bb in get_pending_status (lp=0x12c5cc0, status=0x7fffffffdc0c) at .../src/gdb/linux-nat.c:1235
  #11 0x00000000004c2738 in detach_callback (lp=0x12c5cc0, data=0x0) at .../src/gdb/linux-nat.c:1317
  #12 0x00000000004c1a2a in iterate_over_lwps (filter=..., callback=0x4c2599 <detach_callback>, data=0x0) at .../src/gdb/linux-nat.c:899
  #13 0x00000000004c295c in linux_nat_detach (ops=0xe7bd30, args=0x0, from_tty=1) at .../src/gdb/linux-nat.c:1358
  #14 0x000000000068284d in delegate_detach (self=0xe7bd30, arg1=0x0, arg2=1) at .../src/gdb/target-delegates.c:34
  #15 0x0000000000694141 in target_detach (args=0x0, from_tty=1) at .../src/gdb/target.c:2241
  #16 0x0000000000630582 in detach_command (args=0x0, from_tty=1) at .../src/gdb/infcmd.c:2975
  ...

Tested on x86-64 Fedora 23.  Also confirmed the test passes against
gdbserver with "maint set target-non-stop".

Unfortunately, making GDB add LWPs to the thread list sooner exposes
inefficiencies that in turn result in
gdb.threads/attach-many-short-lived-threads.exp timing out frequently.
Since that testcase is really a contrived use case designed to stress
some aspects of attach/detach and thread listing, not really
representative of real programs, this commit disables the test.

gdb/ChangeLog:
2016-05-25  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* linux-nat.c (attach_proc_task_lwp_callback): Mark the lwp
	resumed, and add the thread to GDB's thread list.

testsuite/ChangeLog:
2016-05-25  Pedro Alves  <palves@redhat.com>

	PR gdb/19828
	* gdb.threads/clone-attach-detach.c: New file.
	* gdb.threads/clone-attach-detach.exp: New file.
	* gdb.threads/attach-many-short-lived-threads.exp: Skip.
---
 gdb/ChangeLog                                      |  6 ++
 gdb/testsuite/ChangeLog                            |  7 ++
 gdb/linux-nat.c                                    | 10 +++
 .../attach-many-short-lived-threads.exp            |  4 +
 gdb/testsuite/gdb.threads/clone-attach-detach.c    | 66 +++++++++++++++
 gdb/testsuite/gdb.threads/clone-attach-detach.exp  | 98 ++++++++++++++++++++++
 6 files changed, 191 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/clone-attach-detach.c
 create mode 100644 gdb/testsuite/gdb.threads/clone-attach-detach.exp

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 179acf0..c27c14d 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,6 +1,12 @@
 2016-05-25  Pedro Alves  <palves@redhat.com>
 
 	PR gdb/19828
+	* linux-nat.c (attach_proc_task_lwp_callback): Mark the lwp
+	resumed, and add the thread to GDB's thread list.
+
+2016-05-25  Pedro Alves  <palves@redhat.com>
+
+	PR gdb/19828
 	* linux-nat.c (get_pending_status): If the thread reported the
 	event to the core and it's pending, use the pending status signal
 	number.
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 5a13e4c..59dce0e 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2016-05-25  Pedro Alves  <palves@redhat.com>
+
+	PR gdb/19828
+	* gdb.threads/clone-attach-detach.c: New file.
+	* gdb.threads/clone-attach-detach.exp: New file.
+	* gdb.threads/attach-many-short-lived-threads.exp: Skip.
+
 2016-05-18  Simon Marchi  <simon.marchi@ericsson.com>
 
 	* gdb.mi/mi-threads-interrupt.c: New file.
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index ea171b0..54d745f 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1088,6 +1088,16 @@ attach_proc_task_lwp_callback (ptid_t ptid)
 	  /* We need to wait for a stop before being able to make the
 	     next ptrace call on this LWP.  */
 	  lp->must_set_ptrace_flags = 1;
+
+	  /* So that wait collects the SIGSTOP.  */
+	  lp->resumed = 1;
+
+	  /* Also add the LWP to gdb's thread list, in case a
+	     matching libthread_db is not found (or the process uses
+	     raw clone).  */
+	  add_thread (lp->ptid);
+	  set_running (lp->ptid, 1);
+	  set_executing (lp->ptid, 1);
 	}
 
       return 1;
diff --git a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
index ccb5e9b..5bb2e82 100644
--- a/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
+++ b/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp
@@ -21,6 +21,10 @@
 # end up leaving stale state behind that confuse the following
 # attach).
 
+# Disabled in gdb 7.11.1 because the fix for PR gdb/19828 exposes
+# latent inefficiencies that make this test often time out.
+return
+
 if {![can_spawn_for_attach]} {
     return 0
 }
diff --git a/gdb/testsuite/gdb.threads/clone-attach-detach.c b/gdb/testsuite/gdb.threads/clone-attach-detach.c
new file mode 100644
index 0000000..daa8f1f
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/clone-attach-detach.c
@@ -0,0 +1,66 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#define STACK_SIZE 0x1000
+
+int clone_pid;
+
+static int
+clone_fn (void *unused)
+{
+  /* Wait for alarm.  */
+  while (1)
+    sleep (1);
+  return 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  unsigned char *stack;
+  int res;
+
+  alarm (300);
+
+  stack = malloc (STACK_SIZE);
+  assert (stack != NULL);
+
+#define CLONE_FLAGS (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)
+
+#ifdef __ia64__
+  clone_pid = __clone2 (clone_fn, stack, STACK_SIZE, CLONE_FLAGS, NULL);
+#else
+  clone_pid = clone (clone_fn, stack + STACK_SIZE, CLONE_FLAGS, NULL);
+#endif
+
+  assert (clone_pid > 0);
+
+  /* Wait for alarm.  */
+  while (1)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/clone-attach-detach.exp b/gdb/testsuite/gdb.threads/clone-attach-detach.exp
new file mode 100644
index 0000000..42e9914
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/clone-attach-detach.exp
@@ -0,0 +1,98 @@
+# This testcase is part of GDB, the GNU debugger.
+
+# Copyright 2016 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test attach / detach from a process that uses raw clone.  We use raw
+# clone as proxy for when libthread_db is not available.
+
+# This only works on targets with the Linux kernel.
+if ![istarget *-*-linux*] {
+    return
+}
+
+if {![can_spawn_for_attach]} {
+    return 0
+}
+
+standard_testfile
+
+if [prepare_for_testing "failed to prepare" $testfile $srcfile {debug}] {
+    return -1
+}
+
+clean_restart ${binfile}
+
+set test_spawn_id [spawn_wait_for_attach $binfile]
+set testpid [spawn_id_get_pid $test_spawn_id]
+
+# Native/gdbserver.
+set thread_re "(LWP $decimal|Thread )"
+
+# Try attach / detach a few times, in case GDB ends up with stale
+# state after detaching.
+
+set attempts 3
+for {set attempt 1} {$attempt <= $attempts} {incr attempt} {
+    with_test_prefix "fg attach $attempt" {
+
+	gdb_test "attach $testpid" \
+	    "Attaching to program.*process $testpid.*" \
+	    "attach"
+
+	gdb_test "info threads" \
+	    "1.*${thread_re}.*\r\n.*2.*${thread_re}.*" \
+	    "info threads shows two LWPs"
+
+	gdb_test "detach" "Detaching from .*, process $testpid"
+    }
+}
+
+# Same, but use async/background attach.
+
+# If debugging with target remote, check whether the all-stop variant
+# of the RSP is being used.  If so, we can't run the background tests.
+if {[target_info exists gdb_protocol]
+    && ([target_info gdb_protocol] == "remote"
+	|| [target_info gdb_protocol] == "extended-remote")} {
+
+    set test "maint show target-non-stop"
+    gdb_test_multiple "maint show target-non-stop" $test {
+	-re "(is|currently) on.*$gdb_prompt $" {
+	}
+	-re "(is|currently) off.*$gdb_prompt $" {
+	    unsupported "bg attach: can't issue info threads while target is running"
+	    return 0
+	}
+    }
+}
+
+set attempts 3
+for {set attempt 1} {$attempt <= $attempts} {incr attempt} {
+    with_test_prefix "bg attach $attempt" {
+
+	gdb_test "attach $testpid &" \
+	    "Attaching to program.*process $testpid.*" \
+	    "attach"
+
+	gdb_test "info threads" \
+	    "1.*${thread_re}.*\\(running\\)\r\n.*2.*${thread_re}.*\\(running\\)" \
+	    "info threads shows two LWPs"
+
+	gdb_test "detach" "Detaching from .*, process $testpid"
+    }
+}
+
+kill_wait_spawned_process $test_spawn_id
-- 
2.5.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-05-25 17:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19 14:48 [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
2016-05-19 14:48 ` [PATCH 3/6] [Linux] Avoid refetching core-of-thread if thread hasn't run Pedro Alves
2016-05-23 12:45   ` Yao Qi
2016-05-19 14:48 ` [PATCH 2/6] [Linux] Read vDSO range from /proc/PID/task/PID/maps instead of /proc/PID/maps Pedro Alves
2016-05-23 11:08   ` Yao Qi
2016-05-19 14:48 ` [PATCH 1/6] Linux native thread create/exit events support Pedro Alves
2016-05-23 10:49   ` Yao Qi
2016-05-23 13:00     ` Pedro Alves
2016-05-19 14:56 ` [PATCH 5/6] Make gdb/linux-nat.c consider a waitstatus pending on the infrun side Pedro Alves
2016-05-19 14:57 ` [PATCH 6/6] Fix PR gdb/19828: gdb -p <process from a container>: internal error Pedro Alves
2016-05-23 10:14   ` Yao Qi
2016-05-25 17:43   ` [pushed/7.11.1] " Pedro Alves
2016-05-19 14:57 ` [PATCH 4/6] [Linux] Optimize PID -> struct lwp_info lookup Pedro Alves
2016-05-23 13:12   ` Yao Qi
2016-05-23 18:10     ` [PATCH v2/htab " Pedro Alves
2016-05-24  9:33       ` Yao Qi
2016-05-24 13:47         ` Pedro Alves
2016-05-19 15:11 ` [PATCH 0/6] Fix PR gdb/19828 (attach -> internal error) and attach optimizations Pedro Alves
2016-05-24 13:57 ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).