public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [RFA]corelow.c: Add tid to add_to_thread_list
@ 2010-08-03  8:49 Hui Zhu
  2010-08-05 18:44 ` Tom Tromey
  0 siblings, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2010-08-03  8:49 UTC (permalink / raw)
  To: gdb-patches ml

Hi,

I found that from gdb 7.1 to gdb-cvs-head cannot analyze the core file
that get from kdump.
What I got:
[New <main task>]
[New Thread 2719]
../../src/gdb/thread.c:884: internal-error: switch_to_thread:
Assertion `inf != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
That is because:
 objdump -h ./vmcore

./vmcore:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 note0         00000a48  0000000000000000  0000000000000000  00000238  2**0
                  CONTENTS, READONLY
  1 .reg/0        000000d8  0000000000000000  0000000000000000  000002bc  2**2
                  CONTENTS
  2 .reg          000000d8  0000000000000000  0000000000000000  000002bc  2**2
                  CONTENTS
  3 .reg/2719     000000d8  0000000000000000  0000000000000000  00000420  2**2
                  CONTENTS
  4 .reg/0        000000d8  0000000000000000  0000000000000000  00000584  2**2
                  CONTENTS
  5 .reg/0        000000d8  0000000000000000  0000000000000000  000006e8  2**2
                  CONTENTS
Each of reg/n is a cpu core note.  It will be a GDB thread.  n is the
prstatus.pr_pid that will be the thread lwpid.  Because the 3 threads
pid is same, so GDB get error.

I make a patch for kernel (http://lkml.org/lkml/2010/8/3/75) but they
think it should be fixed in user space.
So I add the tid to add_to_thread_list.

Please help me review it.

Thanks,
Hui

2010-08-03  Hui Zhu  <teawater@gmail.com>

	* corelow.c(add_to_thread_list): Add tid.


---
 corelow.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/corelow.c
+++ b/corelow.c
@@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection
 {
   ptid_t ptid;
   int core_tid;
-  int pid, lwpid;
+  int pid, lwpid, tid;
   asection *reg_sect = (asection *) reg_sect_arg;

   if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0)
@@ -278,7 +278,14 @@ add_to_thread_list (bfd *abfd, asection
   if (current_inferior ()->pid == 0)
     inferior_appeared (current_inferior (), pid);

-  ptid = ptid_build (pid, lwpid, 0);
+  tid = 0;
+get_ptid:
+  ptid = ptid_build (pid, lwpid, tid);
+  if (find_thread_ptid (ptid))
+    {
+      tid ++;
+      goto get_ptid;
+    }

   add_thread (ptid);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-03  8:49 [RFA]corelow.c: Add tid to add_to_thread_list Hui Zhu
@ 2010-08-05 18:44 ` Tom Tromey
  2010-08-06  2:56   ` Hui Zhu
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2010-08-05 18:44 UTC (permalink / raw)
  To: Hui Zhu; +Cc: gdb-patches ml

>>>>> ">" == Hui Zhu <teawater@gmail.com> writes:

>> I make a patch for kernel (http://lkml.org/lkml/2010/8/3/75) but they
>> think it should be fixed in user space.
>> So I add the tid to add_to_thread_list.

I don't know about this area of gdb, so I can't really comment on the
semantics of the change, but I do have a comment about how the change is
written:

>> -  ptid = ptid_build (pid, lwpid, 0);
>> +  tid = 0;
>> +get_ptid:
>> +  ptid = ptid_build (pid, lwpid, tid);
>> +  if (find_thread_ptid (ptid))
>> +    {
>> +      tid ++;
>> +      goto get_ptid;
>> +    }

I think this would be more readably written without 'goto', as a loop.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-05 18:44 ` Tom Tromey
@ 2010-08-06  2:56   ` Hui Zhu
  2010-08-06  9:57     ` Pedro Alves
  0 siblings, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2010-08-06  2:56 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches ml

On Fri, Aug 6, 2010 at 02:44, Tom Tromey <tromey@redhat.com> wrote:
>>>>>> ">" == Hui Zhu <teawater@gmail.com> writes:
>
>>> I make a patch for kernel (http://lkml.org/lkml/2010/8/3/75) but they
>>> think it should be fixed in user space.
>>> So I add the tid to add_to_thread_list.
>
> I don't know about this area of gdb, so I can't really comment on the
> semantics of the change, but I do have a comment about how the change is
> written:
>
>>> -  ptid = ptid_build (pid, lwpid, 0);
>>> +  tid = 0;
>>> +get_ptid:
>>> +  ptid = ptid_build (pid, lwpid, tid);
>>> +  if (find_thread_ptid (ptid))
>>> +    {
>>> +      tid ++;
>>> +      goto get_ptid;
>>> +    }
>
> I think this would be more readably written without 'goto', as a loop.
>
> Tom
>

Thanks.  The prev one is too casual.  I make a new one that use the loop.

Best,
Hui

2010-08-06  Hui Zhu  <teawater@gmail.com>

	* corelow.c(add_to_thread_list): Add tid.

---
 corelow.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/corelow.c
+++ b/corelow.c
@@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection
 {
   ptid_t ptid;
   int core_tid;
-  int pid, lwpid;
+  int pid, lwpid, tid;
   asection *reg_sect = (asection *) reg_sect_arg;

   if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0)
@@ -278,7 +278,13 @@ add_to_thread_list (bfd *abfd, asection
   if (current_inferior ()->pid == 0)
     inferior_appeared (current_inferior (), pid);

-  ptid = ptid_build (pid, lwpid, 0);
+  tid = 0;
+  do
+    {
+      ptid = ptid_build (pid, lwpid, tid);
+      tid ++;
+    }
+  while (find_thread_ptid (ptid));

   add_thread (ptid);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-06  2:56   ` Hui Zhu
@ 2010-08-06  9:57     ` Pedro Alves
  2010-08-06 16:48       ` Hui Zhu
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Alves @ 2010-08-06  9:57 UTC (permalink / raw)
  To: gdb-patches; +Cc: Hui Zhu

On Friday 06 August 2010 03:55:51, Hui Zhu wrote:

> 2010-08-06  Hui Zhu  <teawater@gmail.com>
> 
> 	* corelow.c(add_to_thread_list): Add tid.
> 
> ---
>  corelow.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> --- a/corelow.c
> +++ b/corelow.c
> @@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection
>  {
>    ptid_t ptid;
>    int core_tid;
> -  int pid, lwpid;
> +  int pid, lwpid, tid;
>    asection *reg_sect = (asection *) reg_sect_arg;
> 
>    if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0)
> @@ -278,7 +278,13 @@ add_to_thread_list (bfd *abfd, asection
>    if (current_inferior ()->pid == 0)
>      inferior_appeared (current_inferior (), pid);
> 
> -  ptid = ptid_build (pid, lwpid, 0);
> +  tid = 0;
> +  do
> +    {
> +      ptid = ptid_build (pid, lwpid, tid);
> +      tid ++;
> +    }
> +  while (find_thread_ptid (ptid));
> 
>    add_thread (ptid);

Sorry, no.  It's not a good idea to have the corelow target
using both the lwpid and the tid fields.  It leaves no room for a
thread_stratum target on top to use.  Going by what someone
said on the other thread:

> The goal was always that something could post process the output of
> the kernel crashdump and create something that is gdb compatible.  It
> looks to me like it would take just a moment to strip out all of the
> idle threads.

What are exactly these "threads" with no PID?  How useful is it for gdb
to expose them as threads?  Are these the idle threads, one per core, not
associated with any process?  Are we talking about a regular process core
dump, or some other core dump?  From the comment quoted above, it appears
that it would be expected that something just stripped out / ignored
these "threads"/notes.  Say, a post processor tool, or bfd, or gdb.

BTW, with this particular patch, you've only gotten past this one
problem, but, you'll be tripping on others.  For example, all register
accesses for lwpid=XXX,tid=0, lwpid=XXX,tid=1, lwpid=XXX,tid=2, etc.,
will be going to the same .reg/XXX section.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-06  9:57     ` Pedro Alves
@ 2010-08-06 16:48       ` Hui Zhu
  2010-08-06 17:18         ` Pedro Alves
  0 siblings, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2010-08-06 16:48 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Fri, Aug 6, 2010 at 17:57, Pedro Alves <pedro@codesourcery.com> wrote:
> On Friday 06 August 2010 03:55:51, Hui Zhu wrote:
>
>> 2010-08-06  Hui Zhu  <teawater@gmail.com>
>>
>>       * corelow.c(add_to_thread_list): Add tid.
>>
>> ---
>>  corelow.c |   10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> --- a/corelow.c
>> +++ b/corelow.c
>> @@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection
>>  {
>>    ptid_t ptid;
>>    int core_tid;
>> -  int pid, lwpid;
>> +  int pid, lwpid, tid;
>>    asection *reg_sect = (asection *) reg_sect_arg;
>>
>>    if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0)
>> @@ -278,7 +278,13 @@ add_to_thread_list (bfd *abfd, asection
>>    if (current_inferior ()->pid == 0)
>>      inferior_appeared (current_inferior (), pid);
>>
>> -  ptid = ptid_build (pid, lwpid, 0);
>> +  tid = 0;
>> +  do
>> +    {
>> +      ptid = ptid_build (pid, lwpid, tid);
>> +      tid ++;
>> +    }
>> +  while (find_thread_ptid (ptid));
>>
>>    add_thread (ptid);
>
> Sorry, no.  It's not a good idea to have the corelow target
> using both the lwpid and the tid fields.  It leaves no room for a
> thread_stratum target on top to use.  Going by what someone
> said on the other thread:
>
>> The goal was always that something could post process the output of
>> the kernel crashdump and create something that is gdb compatible.  It
>> looks to me like it would take just a moment to strip out all of the
>> idle threads.
>
> What are exactly these "threads" with no PID?  How useful is it for gdb
> to expose them as threads?  Are these the idle threads, one per core, not
> associated with any process?  Are we talking about a regular process core
> dump, or some other core dump?  From the comment quoted above, it appears
> that it would be expected that something just stripped out / ignored
> these "threads"/notes.  Say, a post processor tool, or bfd, or gdb.

The root cause about this issue is the idle thread's pid is 0.  If
more than one cpu is in idle and each cpu will be a thread in core
file, we got a core file that have some thread ptid is same.
For now, gdb cannot handle it:
struct thread_info *
add_thread_silent (ptid_t ptid)
{
  struct thread_info *tp;

  tp = find_thread_ptid (ptid);
  if (tp)
    /* Found an old thread with the same id.  It has to be dead,
       otherwise we wouldn't be adding a new thread with the same id.
       The OS is reusing this id --- delete it, and recreate a new
       one.  */

I make a patch that can ignore the same ptid thread and output a warning.

>
> BTW, with this particular patch, you've only gotten past this one
> problem, but, you'll be tripping on others.  For example, all register
> accesses for lwpid=XXX,tid=0, lwpid=XXX,tid=1, lwpid=XXX,tid=2, etc.,
> will be going to the same .reg/XXX section.

Yes, that is a trouble.

2010-08-07  Hui Zhu  <teawater@gmail.com>

	* corelow.c(add_to_thread_list): Ignore the thread if already
	a ptid is in thread list.

---
 corelow.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/corelow.c
+++ b/corelow.c
@@ -279,6 +279,12 @@ add_to_thread_list (bfd *abfd, asection
     inferior_appeared (current_inferior (), pid);

   ptid = ptid_build (pid, lwpid, 0);
+  if (find_thread_ptid (ptid))
+    {
+      warning (_("Get more than one thread ptid is %s.  Ignore it."),
+	       target_pid_to_str (ptid));
+      return;
+    }

   add_thread (ptid);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-06 16:48       ` Hui Zhu
@ 2010-08-06 17:18         ` Pedro Alves
  2010-08-06 20:06           ` Pedro Alves
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Alves @ 2010-08-06 17:18 UTC (permalink / raw)
  To: Hui Zhu; +Cc: gdb-patches

On Friday 06 August 2010 17:47:53, Hui Zhu wrote:

> The root cause about this issue is the idle thread's pid is 0.  

I'm still interested in answers to the questions I wrote before.
Reading the thread again, I understand this a kernel dump core.
Am I correct?  I've never loaded one in gdb, hence my questions.
From your early objdump output, 

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 note0         00000a48  0000000000000000  0000000000000000  00000238  2**0
                  CONTENTS, READONLY
  1 .reg/0        000000d8  0000000000000000  0000000000000000  000002bc  2**2
                  CONTENTS
  2 .reg          000000d8  0000000000000000  0000000000000000  000002bc  2**2
                  CONTENTS
  3 .reg/2719     000000d8  0000000000000000  0000000000000000  00000420  2**2
                  CONTENTS
  4 .reg/0        000000d8  0000000000000000  0000000000000000  00000584  2**2
                  CONTENTS
  5 .reg/0        000000d8  0000000000000000  0000000000000000  000006e8  2**2


there's always one thread per core, never more.  Is that correct?  Is there
any indication in the core notes that would allow us to identify this core
as a kernel core, not an application core?  IMO, since we're debugging at
the kernel level, we'd instead use that info to teach bfd info building the
.reg sections as, say:

  0 note0         00000a48  0000000000000000  0000000000000000  00000238  2**0
                  CONTENTS, READONLY
  1 .reg/1        000000d8  0000000000000000  0000000000000000  000002bc  2**2
                  CONTENTS
  2 .reg          000000d8  0000000000000000  0000000000000000  000002bc  2**2
                  CONTENTS
  3 .reg/2        000000d8  0000000000000000  0000000000000000  00000420  2**2
                  CONTENTS
  4 .reg/3        000000d8  0000000000000000  0000000000000000  00000584  2**2
                  CONTENTS
  5 .reg/4        000000d8  0000000000000000  0000000000000000  000006e8  2**2

that is, identify the cores, not the process the core happened to be
running.  

> If
> more than one cpu is in idle and each cpu will be a thread in core
> file, we got a core file that have some thread ptid is same.
> For now, gdb cannot handle it:
> struct thread_info *
> add_thread_silent (ptid_t ptid)
> {

If this function hit an internal error in this scenario, then
it has a bug.  I think Maciej wrote a patch to fix it in our
internal tree.  I'll try to look for it.  Note that with this
fixed, gdb would still discard all idle threads but one,
and, when accessing the registers of the one that stays, we'd
be accessing the wrong .reg section.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-06 17:18         ` Pedro Alves
@ 2010-08-06 20:06           ` Pedro Alves
  2010-08-06 20:50             ` Maciej W. Rozycki
  2010-08-09  2:28             ` Hui Zhu
  0 siblings, 2 replies; 10+ messages in thread
From: Pedro Alves @ 2010-08-06 20:06 UTC (permalink / raw)
  To: gdb-patches, Maciej W. Rozycki; +Cc: Hui Zhu

On Friday 06 August 2010 18:17:48, Pedro Alves wrote:
> On Friday 06 August 2010 17:47:53, Hui Zhu wrote:

> > struct thread_info *
> > add_thread_silent (ptid_t ptid)
> > {
> 
> If this function hit an internal error in this scenario, then
> it has a bug.  I think Maciej wrote a patch to fix it in our
> internal tree.  I'll try to look for it.  Note that with this
> fixed, gdb would still discard all idle threads but one,
> and, when accessing the registers of the one that stays, we'd
> be accessing the wrong .reg section.

Here's the patch.  (Making Maciej's words mine, and tweaking
for context, ) this assertion:

.../gdb/thread.c:880: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

is related to this change:

2009-05-24  Pedro Alves  <pedro@codesourcery.com>

        * thread.c (new_thread): New function.
        (add_thread_silent): Use it.

which removed a piece of code that actually added minus_one_ptid to
the list of threads.  The current state of switch_to_thread doesn't
allow minus_one_ptid in this context, but null_ptid can be safely used
instead to the same effect.  The change was bogus --- the intent was
to use new_thread (minus_one_ptid), not new_thread (ptid).
Maciej's patch fixes this just as well (or perhaps better), so I've
applied to mainline, and the 7.2 and 7.1 branches.

-- 
Pedro Alves

2010-08-06  Maciej W. Rozycki  <macro@codesourcery.com>

	gdb/
	* thread.c (add_thread_silent): Use null_ptid instead of
	minus_one_ptid while getting rid of stale inferior_ptid.

Index: gdb/thread.c
===================================================================
--- gdb/thread.c	(revision 283123)
+++ gdb/thread.c	(working copy)
@@ -187,11 +187,11 @@ add_thread_silent (ptid_t ptid)
 
       if (ptid_equal (inferior_ptid, ptid))
 	{
-	  tp = new_thread (ptid);
+	  tp = new_thread (null_ptid);
 
 	  /* Make switch_to_thread not read from the thread.  */
 	  tp->state_ = THREAD_EXITED;
-	  switch_to_thread (minus_one_ptid);
+	  switch_to_thread (null_ptid);
 
 	  /* Now we can delete it.  */
 	  delete_thread (ptid);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-06 20:06           ` Pedro Alves
@ 2010-08-06 20:50             ` Maciej W. Rozycki
  2010-08-09  2:28             ` Hui Zhu
  1 sibling, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2010-08-06 20:50 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches, Hui Zhu

On Fri, 6 Aug 2010, Pedro Alves wrote:

> Maciej's patch fixes this just as well (or perhaps better), so I've
> applied to mainline, and the 7.2 and 7.1 branches.

 Thanks for pushing it.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-06 20:06           ` Pedro Alves
  2010-08-06 20:50             ` Maciej W. Rozycki
@ 2010-08-09  2:28             ` Hui Zhu
  2010-08-09 14:48               ` Pedro Alves
  1 sibling, 1 reply; 10+ messages in thread
From: Hui Zhu @ 2010-08-09  2:28 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches, Maciej W. Rozycki

Thanks.

It work on my part got:

Reading symbols from /home/teawater/tmp/vmlinux...done.
[New process 1]
[New process 1]
[New process 1]
[New LWP 2731]
#0  0xffffffff8020b037 in __sti_mwait () at include2/asm/processor.h:719
719	include2/asm/processor.h: No such file or directory.
	in include2/asm/processor.h
(gdb) info threads
  4 LWP 2731  crash_setup_regs (regs=0x0) at include2/asm/kexec.h:155
* 3 process 1  0xffffffff8020b037 in __sti_mwait () at
include2/asm/processor.h:719

Looks this is a fit way without change libbfd.

Best,
Hui

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFA]corelow.c: Add tid to add_to_thread_list
  2010-08-09  2:28             ` Hui Zhu
@ 2010-08-09 14:48               ` Pedro Alves
  0 siblings, 0 replies; 10+ messages in thread
From: Pedro Alves @ 2010-08-09 14:48 UTC (permalink / raw)
  To: Hui Zhu; +Cc: gdb-patches, Maciej W. Rozycki

On Monday 09 August 2010 03:28:23, Hui Zhu wrote:

> It work on my part got:
> 
> Reading symbols from /home/teawater/tmp/vmlinux...done.
> [New process 1]
> [New process 1]
> [New process 1]
> [New LWP 2731]
> #0  0xffffffff8020b037 in __sti_mwait () at include2/asm/processor.h:719
> 719	include2/asm/processor.h: No such file or directory.
> 	in include2/asm/processor.h
> (gdb) info threads
>   4 LWP 2731  crash_setup_regs (regs=0x0) at include2/asm/kexec.h:155
> * 3 process 1  0xffffffff8020b037 in __sti_mwait () at
> include2/asm/processor.h:719

Thanks for confirming.

> Looks this is a fit way without change libbfd.

IMO, I'd still be better to teach bfd about these cores as I
suggested before, but it doesn't itch me enough to go
fix it myself.  ;-)

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-08-09 14:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-03  8:49 [RFA]corelow.c: Add tid to add_to_thread_list Hui Zhu
2010-08-05 18:44 ` Tom Tromey
2010-08-06  2:56   ` Hui Zhu
2010-08-06  9:57     ` Pedro Alves
2010-08-06 16:48       ` Hui Zhu
2010-08-06 17:18         ` Pedro Alves
2010-08-06 20:06           ` Pedro Alves
2010-08-06 20:50             ` Maciej W. Rozycki
2010-08-09  2:28             ` Hui Zhu
2010-08-09 14:48               ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).