* [RFA]corelow.c: Add tid to add_to_thread_list @ 2010-08-03 8:49 Hui Zhu 2010-08-05 18:44 ` Tom Tromey 0 siblings, 1 reply; 10+ messages in thread From: Hui Zhu @ 2010-08-03 8:49 UTC (permalink / raw) To: gdb-patches ml Hi, I found that from gdb 7.1 to gdb-cvs-head cannot analyze the core file that get from kdump. What I got: [New <main task>] [New Thread 2719] ../../src/gdb/thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) That is because: objdump -h ./vmcore ./vmcore: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 note0 00000a48 0000000000000000 0000000000000000 00000238 2**0 CONTENTS, READONLY 1 .reg/0 000000d8 0000000000000000 0000000000000000 000002bc 2**2 CONTENTS 2 .reg 000000d8 0000000000000000 0000000000000000 000002bc 2**2 CONTENTS 3 .reg/2719 000000d8 0000000000000000 0000000000000000 00000420 2**2 CONTENTS 4 .reg/0 000000d8 0000000000000000 0000000000000000 00000584 2**2 CONTENTS 5 .reg/0 000000d8 0000000000000000 0000000000000000 000006e8 2**2 CONTENTS Each of reg/n is a cpu core note. It will be a GDB thread. n is the prstatus.pr_pid that will be the thread lwpid. Because the 3 threads pid is same, so GDB get error. I make a patch for kernel (http://lkml.org/lkml/2010/8/3/75) but they think it should be fixed in user space. So I add the tid to add_to_thread_list. Please help me review it. Thanks, Hui 2010-08-03 Hui Zhu <teawater@gmail.com> * corelow.c(add_to_thread_list): Add tid. --- corelow.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) --- a/corelow.c +++ b/corelow.c @@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection { ptid_t ptid; int core_tid; - int pid, lwpid; + int pid, lwpid, tid; asection *reg_sect = (asection *) reg_sect_arg; if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0) @@ -278,7 +278,14 @@ add_to_thread_list (bfd *abfd, asection if (current_inferior ()->pid == 0) inferior_appeared (current_inferior (), pid); - ptid = ptid_build (pid, lwpid, 0); + tid = 0; +get_ptid: + ptid = ptid_build (pid, lwpid, tid); + if (find_thread_ptid (ptid)) + { + tid ++; + goto get_ptid; + } add_thread (ptid); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-03 8:49 [RFA]corelow.c: Add tid to add_to_thread_list Hui Zhu @ 2010-08-05 18:44 ` Tom Tromey 2010-08-06 2:56 ` Hui Zhu 0 siblings, 1 reply; 10+ messages in thread From: Tom Tromey @ 2010-08-05 18:44 UTC (permalink / raw) To: Hui Zhu; +Cc: gdb-patches ml >>>>> ">" == Hui Zhu <teawater@gmail.com> writes: >> I make a patch for kernel (http://lkml.org/lkml/2010/8/3/75) but they >> think it should be fixed in user space. >> So I add the tid to add_to_thread_list. I don't know about this area of gdb, so I can't really comment on the semantics of the change, but I do have a comment about how the change is written: >> - ptid = ptid_build (pid, lwpid, 0); >> + tid = 0; >> +get_ptid: >> + ptid = ptid_build (pid, lwpid, tid); >> + if (find_thread_ptid (ptid)) >> + { >> + tid ++; >> + goto get_ptid; >> + } I think this would be more readably written without 'goto', as a loop. Tom ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-05 18:44 ` Tom Tromey @ 2010-08-06 2:56 ` Hui Zhu 2010-08-06 9:57 ` Pedro Alves 0 siblings, 1 reply; 10+ messages in thread From: Hui Zhu @ 2010-08-06 2:56 UTC (permalink / raw) To: Tom Tromey; +Cc: gdb-patches ml On Fri, Aug 6, 2010 at 02:44, Tom Tromey <tromey@redhat.com> wrote: >>>>>> ">" == Hui Zhu <teawater@gmail.com> writes: > >>> I make a patch for kernel (http://lkml.org/lkml/2010/8/3/75) but they >>> think it should be fixed in user space. >>> So I add the tid to add_to_thread_list. > > I don't know about this area of gdb, so I can't really comment on the > semantics of the change, but I do have a comment about how the change is > written: > >>> - ptid = ptid_build (pid, lwpid, 0); >>> + tid = 0; >>> +get_ptid: >>> + ptid = ptid_build (pid, lwpid, tid); >>> + if (find_thread_ptid (ptid)) >>> + { >>> + tid ++; >>> + goto get_ptid; >>> + } > > I think this would be more readably written without 'goto', as a loop. > > Tom > Thanks. The prev one is too casual. I make a new one that use the loop. Best, Hui 2010-08-06 Hui Zhu <teawater@gmail.com> * corelow.c(add_to_thread_list): Add tid. --- corelow.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/corelow.c +++ b/corelow.c @@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection { ptid_t ptid; int core_tid; - int pid, lwpid; + int pid, lwpid, tid; asection *reg_sect = (asection *) reg_sect_arg; if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0) @@ -278,7 +278,13 @@ add_to_thread_list (bfd *abfd, asection if (current_inferior ()->pid == 0) inferior_appeared (current_inferior (), pid); - ptid = ptid_build (pid, lwpid, 0); + tid = 0; + do + { + ptid = ptid_build (pid, lwpid, tid); + tid ++; + } + while (find_thread_ptid (ptid)); add_thread (ptid); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-06 2:56 ` Hui Zhu @ 2010-08-06 9:57 ` Pedro Alves 2010-08-06 16:48 ` Hui Zhu 0 siblings, 1 reply; 10+ messages in thread From: Pedro Alves @ 2010-08-06 9:57 UTC (permalink / raw) To: gdb-patches; +Cc: Hui Zhu On Friday 06 August 2010 03:55:51, Hui Zhu wrote: > 2010-08-06 Hui Zhu <teawater@gmail.com> > > * corelow.c(add_to_thread_list): Add tid. > > --- > corelow.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > --- a/corelow.c > +++ b/corelow.c > @@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection > { > ptid_t ptid; > int core_tid; > - int pid, lwpid; > + int pid, lwpid, tid; > asection *reg_sect = (asection *) reg_sect_arg; > > if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0) > @@ -278,7 +278,13 @@ add_to_thread_list (bfd *abfd, asection > if (current_inferior ()->pid == 0) > inferior_appeared (current_inferior (), pid); > > - ptid = ptid_build (pid, lwpid, 0); > + tid = 0; > + do > + { > + ptid = ptid_build (pid, lwpid, tid); > + tid ++; > + } > + while (find_thread_ptid (ptid)); > > add_thread (ptid); Sorry, no. It's not a good idea to have the corelow target using both the lwpid and the tid fields. It leaves no room for a thread_stratum target on top to use. Going by what someone said on the other thread: > The goal was always that something could post process the output of > the kernel crashdump and create something that is gdb compatible. It > looks to me like it would take just a moment to strip out all of the > idle threads. What are exactly these "threads" with no PID? How useful is it for gdb to expose them as threads? Are these the idle threads, one per core, not associated with any process? Are we talking about a regular process core dump, or some other core dump? From the comment quoted above, it appears that it would be expected that something just stripped out / ignored these "threads"/notes. Say, a post processor tool, or bfd, or gdb. BTW, with this particular patch, you've only gotten past this one problem, but, you'll be tripping on others. For example, all register accesses for lwpid=XXX,tid=0, lwpid=XXX,tid=1, lwpid=XXX,tid=2, etc., will be going to the same .reg/XXX section. -- Pedro Alves ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-06 9:57 ` Pedro Alves @ 2010-08-06 16:48 ` Hui Zhu 2010-08-06 17:18 ` Pedro Alves 0 siblings, 1 reply; 10+ messages in thread From: Hui Zhu @ 2010-08-06 16:48 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches On Fri, Aug 6, 2010 at 17:57, Pedro Alves <pedro@codesourcery.com> wrote: > On Friday 06 August 2010 03:55:51, Hui Zhu wrote: > >> 2010-08-06 Hui Zhu <teawater@gmail.com> >> >> * corelow.c(add_to_thread_list): Add tid. >> >> --- >> corelow.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> --- a/corelow.c >> +++ b/corelow.c >> @@ -244,7 +244,7 @@ add_to_thread_list (bfd *abfd, asection >> { >> ptid_t ptid; >> int core_tid; >> - int pid, lwpid; >> + int pid, lwpid, tid; >> asection *reg_sect = (asection *) reg_sect_arg; >> >> if (strncmp (bfd_section_name (abfd, asect), ".reg/", 5) != 0) >> @@ -278,7 +278,13 @@ add_to_thread_list (bfd *abfd, asection >> if (current_inferior ()->pid == 0) >> inferior_appeared (current_inferior (), pid); >> >> - ptid = ptid_build (pid, lwpid, 0); >> + tid = 0; >> + do >> + { >> + ptid = ptid_build (pid, lwpid, tid); >> + tid ++; >> + } >> + while (find_thread_ptid (ptid)); >> >> add_thread (ptid); > > Sorry, no. It's not a good idea to have the corelow target > using both the lwpid and the tid fields. It leaves no room for a > thread_stratum target on top to use. Going by what someone > said on the other thread: > >> The goal was always that something could post process the output of >> the kernel crashdump and create something that is gdb compatible. It >> looks to me like it would take just a moment to strip out all of the >> idle threads. > > What are exactly these "threads" with no PID? How useful is it for gdb > to expose them as threads? Are these the idle threads, one per core, not > associated with any process? Are we talking about a regular process core > dump, or some other core dump? From the comment quoted above, it appears > that it would be expected that something just stripped out / ignored > these "threads"/notes. Say, a post processor tool, or bfd, or gdb. The root cause about this issue is the idle thread's pid is 0. If more than one cpu is in idle and each cpu will be a thread in core file, we got a core file that have some thread ptid is same. For now, gdb cannot handle it: struct thread_info * add_thread_silent (ptid_t ptid) { struct thread_info *tp; tp = find_thread_ptid (ptid); if (tp) /* Found an old thread with the same id. It has to be dead, otherwise we wouldn't be adding a new thread with the same id. The OS is reusing this id --- delete it, and recreate a new one. */ I make a patch that can ignore the same ptid thread and output a warning. > > BTW, with this particular patch, you've only gotten past this one > problem, but, you'll be tripping on others. For example, all register > accesses for lwpid=XXX,tid=0, lwpid=XXX,tid=1, lwpid=XXX,tid=2, etc., > will be going to the same .reg/XXX section. Yes, that is a trouble. 2010-08-07 Hui Zhu <teawater@gmail.com> * corelow.c(add_to_thread_list): Ignore the thread if already a ptid is in thread list. --- corelow.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/corelow.c +++ b/corelow.c @@ -279,6 +279,12 @@ add_to_thread_list (bfd *abfd, asection inferior_appeared (current_inferior (), pid); ptid = ptid_build (pid, lwpid, 0); + if (find_thread_ptid (ptid)) + { + warning (_("Get more than one thread ptid is %s. Ignore it."), + target_pid_to_str (ptid)); + return; + } add_thread (ptid); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-06 16:48 ` Hui Zhu @ 2010-08-06 17:18 ` Pedro Alves 2010-08-06 20:06 ` Pedro Alves 0 siblings, 1 reply; 10+ messages in thread From: Pedro Alves @ 2010-08-06 17:18 UTC (permalink / raw) To: Hui Zhu; +Cc: gdb-patches On Friday 06 August 2010 17:47:53, Hui Zhu wrote: > The root cause about this issue is the idle thread's pid is 0. I'm still interested in answers to the questions I wrote before. Reading the thread again, I understand this a kernel dump core. Am I correct? I've never loaded one in gdb, hence my questions. From your early objdump output, Sections: Idx Name Size VMA LMA File off Algn 0 note0 00000a48 0000000000000000 0000000000000000 00000238 2**0 CONTENTS, READONLY 1 .reg/0 000000d8 0000000000000000 0000000000000000 000002bc 2**2 CONTENTS 2 .reg 000000d8 0000000000000000 0000000000000000 000002bc 2**2 CONTENTS 3 .reg/2719 000000d8 0000000000000000 0000000000000000 00000420 2**2 CONTENTS 4 .reg/0 000000d8 0000000000000000 0000000000000000 00000584 2**2 CONTENTS 5 .reg/0 000000d8 0000000000000000 0000000000000000 000006e8 2**2 there's always one thread per core, never more. Is that correct? Is there any indication in the core notes that would allow us to identify this core as a kernel core, not an application core? IMO, since we're debugging at the kernel level, we'd instead use that info to teach bfd info building the .reg sections as, say: 0 note0 00000a48 0000000000000000 0000000000000000 00000238 2**0 CONTENTS, READONLY 1 .reg/1 000000d8 0000000000000000 0000000000000000 000002bc 2**2 CONTENTS 2 .reg 000000d8 0000000000000000 0000000000000000 000002bc 2**2 CONTENTS 3 .reg/2 000000d8 0000000000000000 0000000000000000 00000420 2**2 CONTENTS 4 .reg/3 000000d8 0000000000000000 0000000000000000 00000584 2**2 CONTENTS 5 .reg/4 000000d8 0000000000000000 0000000000000000 000006e8 2**2 that is, identify the cores, not the process the core happened to be running. > If > more than one cpu is in idle and each cpu will be a thread in core > file, we got a core file that have some thread ptid is same. > For now, gdb cannot handle it: > struct thread_info * > add_thread_silent (ptid_t ptid) > { If this function hit an internal error in this scenario, then it has a bug. I think Maciej wrote a patch to fix it in our internal tree. I'll try to look for it. Note that with this fixed, gdb would still discard all idle threads but one, and, when accessing the registers of the one that stays, we'd be accessing the wrong .reg section. -- Pedro Alves ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-06 17:18 ` Pedro Alves @ 2010-08-06 20:06 ` Pedro Alves 2010-08-06 20:50 ` Maciej W. Rozycki 2010-08-09 2:28 ` Hui Zhu 0 siblings, 2 replies; 10+ messages in thread From: Pedro Alves @ 2010-08-06 20:06 UTC (permalink / raw) To: gdb-patches, Maciej W. Rozycki; +Cc: Hui Zhu On Friday 06 August 2010 18:17:48, Pedro Alves wrote: > On Friday 06 August 2010 17:47:53, Hui Zhu wrote: > > struct thread_info * > > add_thread_silent (ptid_t ptid) > > { > > If this function hit an internal error in this scenario, then > it has a bug. I think Maciej wrote a patch to fix it in our > internal tree. I'll try to look for it. Note that with this > fixed, gdb would still discard all idle threads but one, > and, when accessing the registers of the one that stays, we'd > be accessing the wrong .reg section. Here's the patch. (Making Maciej's words mine, and tweaking for context, ) this assertion: .../gdb/thread.c:880: internal-error: switch_to_thread: Assertion `inf != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) is related to this change: 2009-05-24 Pedro Alves <pedro@codesourcery.com> * thread.c (new_thread): New function. (add_thread_silent): Use it. which removed a piece of code that actually added minus_one_ptid to the list of threads. The current state of switch_to_thread doesn't allow minus_one_ptid in this context, but null_ptid can be safely used instead to the same effect. The change was bogus --- the intent was to use new_thread (minus_one_ptid), not new_thread (ptid). Maciej's patch fixes this just as well (or perhaps better), so I've applied to mainline, and the 7.2 and 7.1 branches. -- Pedro Alves 2010-08-06 Maciej W. Rozycki <macro@codesourcery.com> gdb/ * thread.c (add_thread_silent): Use null_ptid instead of minus_one_ptid while getting rid of stale inferior_ptid. Index: gdb/thread.c =================================================================== --- gdb/thread.c (revision 283123) +++ gdb/thread.c (working copy) @@ -187,11 +187,11 @@ add_thread_silent (ptid_t ptid) if (ptid_equal (inferior_ptid, ptid)) { - tp = new_thread (ptid); + tp = new_thread (null_ptid); /* Make switch_to_thread not read from the thread. */ tp->state_ = THREAD_EXITED; - switch_to_thread (minus_one_ptid); + switch_to_thread (null_ptid); /* Now we can delete it. */ delete_thread (ptid); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-06 20:06 ` Pedro Alves @ 2010-08-06 20:50 ` Maciej W. Rozycki 2010-08-09 2:28 ` Hui Zhu 1 sibling, 0 replies; 10+ messages in thread From: Maciej W. Rozycki @ 2010-08-06 20:50 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches, Hui Zhu On Fri, 6 Aug 2010, Pedro Alves wrote: > Maciej's patch fixes this just as well (or perhaps better), so I've > applied to mainline, and the 7.2 and 7.1 branches. Thanks for pushing it. Maciej ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-06 20:06 ` Pedro Alves 2010-08-06 20:50 ` Maciej W. Rozycki @ 2010-08-09 2:28 ` Hui Zhu 2010-08-09 14:48 ` Pedro Alves 1 sibling, 1 reply; 10+ messages in thread From: Hui Zhu @ 2010-08-09 2:28 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches, Maciej W. Rozycki Thanks. It work on my part got: Reading symbols from /home/teawater/tmp/vmlinux...done. [New process 1] [New process 1] [New process 1] [New LWP 2731] #0 0xffffffff8020b037 in __sti_mwait () at include2/asm/processor.h:719 719 include2/asm/processor.h: No such file or directory. in include2/asm/processor.h (gdb) info threads 4 LWP 2731 crash_setup_regs (regs=0x0) at include2/asm/kexec.h:155 * 3 process 1 0xffffffff8020b037 in __sti_mwait () at include2/asm/processor.h:719 Looks this is a fit way without change libbfd. Best, Hui ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFA]corelow.c: Add tid to add_to_thread_list 2010-08-09 2:28 ` Hui Zhu @ 2010-08-09 14:48 ` Pedro Alves 0 siblings, 0 replies; 10+ messages in thread From: Pedro Alves @ 2010-08-09 14:48 UTC (permalink / raw) To: Hui Zhu; +Cc: gdb-patches, Maciej W. Rozycki On Monday 09 August 2010 03:28:23, Hui Zhu wrote: > It work on my part got: > > Reading symbols from /home/teawater/tmp/vmlinux...done. > [New process 1] > [New process 1] > [New process 1] > [New LWP 2731] > #0 0xffffffff8020b037 in __sti_mwait () at include2/asm/processor.h:719 > 719 include2/asm/processor.h: No such file or directory. > in include2/asm/processor.h > (gdb) info threads > 4 LWP 2731 crash_setup_regs (regs=0x0) at include2/asm/kexec.h:155 > * 3 process 1 0xffffffff8020b037 in __sti_mwait () at > include2/asm/processor.h:719 Thanks for confirming. > Looks this is a fit way without change libbfd. IMO, I'd still be better to teach bfd about these cores as I suggested before, but it doesn't itch me enough to go fix it myself. ;-) -- Pedro Alves ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-08-09 14:48 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-03 8:49 [RFA]corelow.c: Add tid to add_to_thread_list Hui Zhu 2010-08-05 18:44 ` Tom Tromey 2010-08-06 2:56 ` Hui Zhu 2010-08-06 9:57 ` Pedro Alves 2010-08-06 16:48 ` Hui Zhu 2010-08-06 17:18 ` Pedro Alves 2010-08-06 20:06 ` Pedro Alves 2010-08-06 20:50 ` Maciej W. Rozycki 2010-08-09 2:28 ` Hui Zhu 2010-08-09 14:48 ` Pedro Alves
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).