thread register state information invalid in core files

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* thread register state information invalid in core files
@ 2006-03-28 14:36 Balazs Scheidler
  2006-03-28 15:31 ` Daniel Jacobowitz
  0 siblings, 1 reply; 5+ messages in thread
From: Balazs Scheidler @ 2006-03-28 14:36 UTC (permalink / raw)
  To: gdb

Hi,

I have tried to analyze a core file generated by Linux kernel 2.6.12
(x86 processor, basically Debian sarge), without much success.

Gdb correctly shows the list of threads in the application, but apart
from the main thread, the backtrace information is unusable.

Main thread:
(gdb) thread 1
[Switching to thread 1 (process 31158)]#0  0x001e0523 in poll () from /lib/tls/libc.so.6
(gdb) bt
#0  0x001e0523 in poll () from /lib/tls/libc.so.6
#1  0x00688296 in g_main_loop_get_context () from /usr/lib/libglib-2.0.so.0
#2  0x00687890 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#3  0x00687b5d in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#4  0x005ad689 in z_main_loop (policy_file=0x8065240 "/etc/zorp/policy.py", instance_name=0x8064fe8 "intra_http", instance_policy_list=0x804c460) at zorp.c:165
#5  0x0804a2d8 in main (argc=1, argv=0xbffff9e4) at main.c:435
#6  0x00126974 in __libc_start_main () from /lib/tls/libc.so.6
#7  0x08049831 in _start () at ../sysdeps/i386/elf/start.S:102

Anything else:
(gdb) thread 2
[Switching to thread 2 (process 26119)]#0  0x00010202 in ?? ()
(gdb) bt
#0  0x00010202 in ?? ()
Cannot access memory at address 0x0
(gdb) info registers
eax            0xc010007b       -1072693125
ecx            0x243948 2373960
edx            0x0      0
ebx            0x1f8    504
esp            0x0      0x0
ebp            0x7b     0x7b
esi            0x409272c        67708716
edi            0x243900 2373888
eip            0x10202  0x10202
eflags         0x7b     123
cs             0x26f4   9972
ss             0x0      0
ds             0xffff   65535
es             0x3965   14693
fs             0x0      0
gs             0x33     51

Looking at the value of ESP and EBP it is possible that gdb incorrectly 
reads the stack-frame information. The funny part that the segfault
itself occurred in the PID number 31158 (not the main thread for sure),
but gdb lists pid 31158 as the main thread with the main thread's stack.

Lucky me I have some information based on the system log, and I know the
address of the stackframe where the segfault occurred. If only 
"frame *address" worked, I could unwind the stack easily, but it does
not.

I have looked at the core file using objdump, and it does seem to
contain information on various threads:

  1 .reg/31158    00000044  00000000  00000000  00002110  2**2
                  CONTENTS
  2 .reg          00000044  00000000  00000000  00002110  2**2
                  CONTENTS
  3 .auxv         00000090  00000000  00000000  00002730  2**2
                  CONTENTS
  4 .reg2/31158   0000006c  00000000  00000000  000027d4  2**2
                  CONTENTS
  5 .reg2         0000006c  00000000  00000000  000027d4  2**2
                  CONTENTS
  6 .reg-xfp/31158 00000200  00000000  00000000  00002854  2**2
                  CONTENTS
  7 .reg-xfp      00000200  00000000  00000000  00002854  2**2
                  CONTENTS
  8 .reg/26119    00000044  00000000  00000000  00002ab0  2**2
                  CONTENTS
  9 .reg2/26119   0000006c  00000000  00000000  00002b0c  2**2
                  CONTENTS
 10 .reg-xfp/26119 00000200  00000000  00000000  00002b8c  2**2
                  CONTENTS
 11 .reg/26108    00000044  00000000  00000000  00002de8  2**2
                  CONTENTS
 12 .reg2/26108   0000006c  00000000  00000000  00002e44  2**2
                  CONTENTS
 13 .reg-xfp/26108 00000200  00000000  00000000  00002ec4  2**2
                  CONTENTS

But as I'm not fluent in core file structure I'm stumped. All this
applies to gdb 6.3. I've just finished compiling gdb 6.4 but it shows
the same symptoms.

Any help?

-- 
Bazsi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread register state information invalid in core files
  2006-03-28 14:36 thread register state information invalid in core files Balazs Scheidler
@ 2006-03-28 15:31 ` Daniel Jacobowitz
  2006-03-28 21:18   ` Balazs Scheidler
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Jacobowitz @ 2006-03-28 15:31 UTC (permalink / raw)
  To: Balazs Scheidler; +Cc: gdb

On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> Anything else:
> (gdb) thread 2
> [Switching to thread 2 (process 26119)]#0  0x00010202 in ?? ()
> (gdb) bt
> #0  0x00010202 in ?? ()
> Cannot access memory at address 0x0
> (gdb) info registers
> eax            0xc010007b       -1072693125
> ecx            0x243948 2373960
> edx            0x0      0
> ebx            0x1f8    504
> esp            0x0      0x0
> ebp            0x7b     0x7b
> esi            0x409272c        67708716
> edi            0x243900 2373888
> eip            0x10202  0x10202
> eflags         0x7b     123
> cs             0x26f4   9972
> ss             0x0      0
> ds             0xffff   65535
> es             0x3965   14693
> fs             0x0      0
> gs             0x33     51
> 
> Looking at the value of ESP and EBP it is possible that gdb incorrectly 
> reads the stack-frame information.

It looks to me like the core file is just corrupt.

These registers are in the pseudo-sections you saw in objdump, in the
order the header files describe for an elf_gregset_t.  You may want to
check the core file by hand; you can dump the sections using objdump -s
-j "sectionname".

I remember having various problems with threaded core dumps in recent
kernels.

> The funny part that the segfault
> itself occurred in the PID number 31158 (not the main thread for sure),
> but gdb lists pid 31158 as the main thread with the main thread's stack.

The kernel always dumps the faulting thread first.


-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread register state information invalid in core files
  2006-03-28 15:31 ` Daniel Jacobowitz
@ 2006-03-28 21:18   ` Balazs Scheidler
  2006-03-28 22:01     ` Daniel Jacobowitz
  2006-03-30 14:32     ` Balazs Scheidler
  0 siblings, 2 replies; 5+ messages in thread
From: Balazs Scheidler @ 2006-03-28 21:18 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb

On Tue, 2006-03-28 at 09:36 -0500, Daniel Jacobowitz wrote:
> On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> > Anything else:
> > (gdb) thread 2
> > [Switching to thread 2 (process 26119)]#0  0x00010202 in ?? ()
> > (gdb) bt
> > #0  0x00010202 in ?? ()
> > Cannot access memory at address 0x0
> > (gdb) info registers
> > eax            0xc010007b       -1072693125
> > ecx            0x243948 2373960
> > edx            0x0      0
> > ebx            0x1f8    504
> > esp            0x0      0x0
> > ebp            0x7b     0x7b
> > esi            0x409272c        67708716
> > edi            0x243900 2373888
> > eip            0x10202  0x10202
> > eflags         0x7b     123
> > cs             0x26f4   9972
> > ss             0x0      0
> > ds             0xffff   65535
> > es             0x3965   14693
> > fs             0x0      0
> > gs             0x33     51
> > 
> > Looking at the value of ESP and EBP it is possible that gdb incorrectly 
> > reads the stack-frame information.
> 
> It looks to me like the core file is just corrupt.
> 
> These registers are in the pseudo-sections you saw in objdump, in the
> order the header files describe for an elf_gregset_t.  You may want to
> check the core file by hand; you can dump the sections using objdump -s
> -j "sectionname".
> 
> I remember having various problems with threaded core dumps in recent
> kernels.

This is the content of .reg/31158 (same as .reg)

Contents of section .reg/31158:
 0000 68ee1008 05000000 bbb70000 00000000  h...............
 0010 402f2400 28f7ffbf fcffffff 7b0010c0  @/$.(.......{...
 0020 7b000000 00000000 33000000 a8000000  {.......3.......
 0030 23051e00 73000000 46020000 1cf7ffbf  #...s...F.......
 0040 7b000000                             {...

and .reg2/31158 (same as .reg2)

Contents of section .reg2/31158:
 0000 7f032000 0000c901 c8c41500 73000000  .. .........s...
 0010 9ce2ffbf 7b000000 801f0000 bd6f0200  ....{........o..
 0020 00000000 ffffffff 01000000 0000ffff  ................
 0030 af3fffff f5130000 ffff818a feffffffx  .?..............
 0040 0100ffff 00000000 000000e0 00400080  .............@..
 0050 4a14f145 51882440 e0da89ea 3a9d5188  J..EQ.$@....:.Q.
 0060 1d4000d8 89ea3a9d 51881d40           .@....:.Q..@

If I understand your hint correctly, the registers should be read as follows:

#define ELF_CORE_COPY_REGS(pr_reg, regs)                \
        pr_reg[0] = regs->ebx;                          \
        pr_reg[1] = regs->ecx;                          \
        pr_reg[2] = regs->edx;                          \
        pr_reg[3] = regs->esi;                          \
        pr_reg[4] = regs->edi;                          \
        pr_reg[5] = regs->ebp;                          \
        pr_reg[6] = regs->eax;                          \
        pr_reg[7] = regs->xds;                          \
        pr_reg[8] = regs->xes;                          \
        savesegment(fs,pr_reg[9]);                      \
        savesegment(gs,pr_reg[10]);                     \
        pr_reg[11] = regs->orig_eax;                    \
        pr_reg[12] = regs->eip;                         \
        pr_reg[13] = regs->xcs;                         \
        pr_reg[14] = regs->eflags;                      \
        pr_reg[15] = regs->esp;                         \
        pr_reg[16] = regs->xss;

This does seem to be the case, "info registers" output from gdb)

eax            0xfffffffc       -4
ecx            0x5      5
edx            0xb7bb   47035
ebx            0x810ee68        135327336
esp            0xbffff71c       0xbffff71c
ebp            0xbffff728       0xbffff728
esi            0x0      0
edi            0x242f40 2371392
eip            0x1e0523 0x1e0523 <poll+131>
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0xc010007b       -1072693125
es             0x7b     123
fs             0x0      0
gs             0x33     51

However the values are bogus. The valid ebp value for the crashing thread is 
0x0409272c

So it seems to be a kernel bug. Any hints where this was fixed or whether 
it was fixed at all?

> 
> > The funny part that the segfault
> > itself occurred in the PID number 31158 (not the main thread for sure),
> > but gdb lists pid 31158 as the main thread with the main thread's stack.
> 
> The kernel always dumps the faulting thread first.

Sure, but it has the context of the main thread.

-- 
Bazsi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread register state information invalid in core files
  2006-03-28 21:18   ` Balazs Scheidler
@ 2006-03-28 22:01     ` Daniel Jacobowitz
  2006-03-30 14:32     ` Balazs Scheidler
  1 sibling, 0 replies; 5+ messages in thread
From: Daniel Jacobowitz @ 2006-03-28 22:01 UTC (permalink / raw)
  To: Balazs Scheidler; +Cc: gdb

On Tue, Mar 28, 2006 at 05:20:22PM +0200, Balazs Scheidler wrote:
> So it seems to be a kernel bug. Any hints where this was fixed or whether 
> it was fixed at all?

Sorry, I've no idea.  If it still occurs in current kernel.org kernels,
you may want to report it to the kernel bugzilla.

> > > The funny part that the segfault
> > > itself occurred in the PID number 31158 (not the main thread for sure),
> > > but gdb lists pid 31158 as the main thread with the main thread's stack.
> > 
> > The kernel always dumps the faulting thread first.
> 
> Sure, but it has the context of the main thread.

Oh, I think I misunderstood you.  You've got the main thread's
registers, but the faulting thread's PID?  I have no idea how that
could happen!

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thread register state information invalid in core files
  2006-03-28 21:18   ` Balazs Scheidler
  2006-03-28 22:01     ` Daniel Jacobowitz
@ 2006-03-30 14:32     ` Balazs Scheidler
  1 sibling, 0 replies; 5+ messages in thread
From: Balazs Scheidler @ 2006-03-30 14:32 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb

On Tue, 2006-03-28 at 17:20 +0200, Balazs Scheidler wrote:
> On Tue, 2006-03-28 at 09:36 -0500, Daniel Jacobowitz wrote:
> > On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> > > Anything else:

> However the values are bogus. The valid ebp value for the crashing thread is 
> 0x0409272c
> 
> So it seems to be a kernel bug. Any hints where this was fixed or whether 
> it was fixed at all?

For the record, the problem was fixed with this patch, supposedly
included in linux 2.6.15:

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=557962a926c62a9c4bd79d6b36df873d4f8c51ef

Thanks for your hints on the problem.

-- 
Bazsi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-03-30 10:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-28 14:36 thread register state information invalid in core files Balazs Scheidler
2006-03-28 15:31 ` Daniel Jacobowitz
2006-03-28 21:18   ` Balazs Scheidler
2006-03-28 22:01     ` Daniel Jacobowitz
2006-03-30 14:32     ` Balazs Scheidler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).