* thread register state information invalid in core files
@ 2006-03-28 14:36 Balazs Scheidler
2006-03-28 15:31 ` Daniel Jacobowitz
0 siblings, 1 reply; 5+ messages in thread
From: Balazs Scheidler @ 2006-03-28 14:36 UTC (permalink / raw)
To: gdb
Hi,
I have tried to analyze a core file generated by Linux kernel 2.6.12
(x86 processor, basically Debian sarge), without much success.
Gdb correctly shows the list of threads in the application, but apart
from the main thread, the backtrace information is unusable.
Main thread:
(gdb) thread 1
[Switching to thread 1 (process 31158)]#0 0x001e0523 in poll () from /lib/tls/libc.so.6
(gdb) bt
#0 0x001e0523 in poll () from /lib/tls/libc.so.6
#1 0x00688296 in g_main_loop_get_context () from /usr/lib/libglib-2.0.so.0
#2 0x00687890 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#3 0x00687b5d in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#4 0x005ad689 in z_main_loop (policy_file=0x8065240 "/etc/zorp/policy.py", instance_name=0x8064fe8 "intra_http", instance_policy_list=0x804c460) at zorp.c:165
#5 0x0804a2d8 in main (argc=1, argv=0xbffff9e4) at main.c:435
#6 0x00126974 in __libc_start_main () from /lib/tls/libc.so.6
#7 0x08049831 in _start () at ../sysdeps/i386/elf/start.S:102
Anything else:
(gdb) thread 2
[Switching to thread 2 (process 26119)]#0 0x00010202 in ?? ()
(gdb) bt
#0 0x00010202 in ?? ()
Cannot access memory at address 0x0
(gdb) info registers
eax 0xc010007b -1072693125
ecx 0x243948 2373960
edx 0x0 0
ebx 0x1f8 504
esp 0x0 0x0
ebp 0x7b 0x7b
esi 0x409272c 67708716
edi 0x243900 2373888
eip 0x10202 0x10202
eflags 0x7b 123
cs 0x26f4 9972
ss 0x0 0
ds 0xffff 65535
es 0x3965 14693
fs 0x0 0
gs 0x33 51
Looking at the value of ESP and EBP it is possible that gdb incorrectly
reads the stack-frame information. The funny part that the segfault
itself occurred in the PID number 31158 (not the main thread for sure),
but gdb lists pid 31158 as the main thread with the main thread's stack.
Lucky me I have some information based on the system log, and I know the
address of the stackframe where the segfault occurred. If only
"frame *address" worked, I could unwind the stack easily, but it does
not.
I have looked at the core file using objdump, and it does seem to
contain information on various threads:
1 .reg/31158 00000044 00000000 00000000 00002110 2**2
CONTENTS
2 .reg 00000044 00000000 00000000 00002110 2**2
CONTENTS
3 .auxv 00000090 00000000 00000000 00002730 2**2
CONTENTS
4 .reg2/31158 0000006c 00000000 00000000 000027d4 2**2
CONTENTS
5 .reg2 0000006c 00000000 00000000 000027d4 2**2
CONTENTS
6 .reg-xfp/31158 00000200 00000000 00000000 00002854 2**2
CONTENTS
7 .reg-xfp 00000200 00000000 00000000 00002854 2**2
CONTENTS
8 .reg/26119 00000044 00000000 00000000 00002ab0 2**2
CONTENTS
9 .reg2/26119 0000006c 00000000 00000000 00002b0c 2**2
CONTENTS
10 .reg-xfp/26119 00000200 00000000 00000000 00002b8c 2**2
CONTENTS
11 .reg/26108 00000044 00000000 00000000 00002de8 2**2
CONTENTS
12 .reg2/26108 0000006c 00000000 00000000 00002e44 2**2
CONTENTS
13 .reg-xfp/26108 00000200 00000000 00000000 00002ec4 2**2
CONTENTS
But as I'm not fluent in core file structure I'm stumped. All this
applies to gdb 6.3. I've just finished compiling gdb 6.4 but it shows
the same symptoms.
Any help?
--
Bazsi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thread register state information invalid in core files
2006-03-28 14:36 thread register state information invalid in core files Balazs Scheidler
@ 2006-03-28 15:31 ` Daniel Jacobowitz
2006-03-28 21:18 ` Balazs Scheidler
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Jacobowitz @ 2006-03-28 15:31 UTC (permalink / raw)
To: Balazs Scheidler; +Cc: gdb
On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> Anything else:
> (gdb) thread 2
> [Switching to thread 2 (process 26119)]#0 0x00010202 in ?? ()
> (gdb) bt
> #0 0x00010202 in ?? ()
> Cannot access memory at address 0x0
> (gdb) info registers
> eax 0xc010007b -1072693125
> ecx 0x243948 2373960
> edx 0x0 0
> ebx 0x1f8 504
> esp 0x0 0x0
> ebp 0x7b 0x7b
> esi 0x409272c 67708716
> edi 0x243900 2373888
> eip 0x10202 0x10202
> eflags 0x7b 123
> cs 0x26f4 9972
> ss 0x0 0
> ds 0xffff 65535
> es 0x3965 14693
> fs 0x0 0
> gs 0x33 51
>
> Looking at the value of ESP and EBP it is possible that gdb incorrectly
> reads the stack-frame information.
It looks to me like the core file is just corrupt.
These registers are in the pseudo-sections you saw in objdump, in the
order the header files describe for an elf_gregset_t. You may want to
check the core file by hand; you can dump the sections using objdump -s
-j "sectionname".
I remember having various problems with threaded core dumps in recent
kernels.
> The funny part that the segfault
> itself occurred in the PID number 31158 (not the main thread for sure),
> but gdb lists pid 31158 as the main thread with the main thread's stack.
The kernel always dumps the faulting thread first.
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thread register state information invalid in core files
2006-03-28 15:31 ` Daniel Jacobowitz
@ 2006-03-28 21:18 ` Balazs Scheidler
2006-03-28 22:01 ` Daniel Jacobowitz
2006-03-30 14:32 ` Balazs Scheidler
0 siblings, 2 replies; 5+ messages in thread
From: Balazs Scheidler @ 2006-03-28 21:18 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb
On Tue, 2006-03-28 at 09:36 -0500, Daniel Jacobowitz wrote:
> On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> > Anything else:
> > (gdb) thread 2
> > [Switching to thread 2 (process 26119)]#0 0x00010202 in ?? ()
> > (gdb) bt
> > #0 0x00010202 in ?? ()
> > Cannot access memory at address 0x0
> > (gdb) info registers
> > eax 0xc010007b -1072693125
> > ecx 0x243948 2373960
> > edx 0x0 0
> > ebx 0x1f8 504
> > esp 0x0 0x0
> > ebp 0x7b 0x7b
> > esi 0x409272c 67708716
> > edi 0x243900 2373888
> > eip 0x10202 0x10202
> > eflags 0x7b 123
> > cs 0x26f4 9972
> > ss 0x0 0
> > ds 0xffff 65535
> > es 0x3965 14693
> > fs 0x0 0
> > gs 0x33 51
> >
> > Looking at the value of ESP and EBP it is possible that gdb incorrectly
> > reads the stack-frame information.
>
> It looks to me like the core file is just corrupt.
>
> These registers are in the pseudo-sections you saw in objdump, in the
> order the header files describe for an elf_gregset_t. You may want to
> check the core file by hand; you can dump the sections using objdump -s
> -j "sectionname".
>
> I remember having various problems with threaded core dumps in recent
> kernels.
This is the content of .reg/31158 (same as .reg)
Contents of section .reg/31158:
0000 68ee1008 05000000 bbb70000 00000000 h...............
0010 402f2400 28f7ffbf fcffffff 7b0010c0 @/$.(.......{...
0020 7b000000 00000000 33000000 a8000000 {.......3.......
0030 23051e00 73000000 46020000 1cf7ffbf #...s...F.......
0040 7b000000 {...
and .reg2/31158 (same as .reg2)
Contents of section .reg2/31158:
0000 7f032000 0000c901 c8c41500 73000000 .. .........s...
0010 9ce2ffbf 7b000000 801f0000 bd6f0200 ....{........o..
0020 00000000 ffffffff 01000000 0000ffff ................
0030 af3fffff f5130000 ffff818a feffffffx .?..............
0040 0100ffff 00000000 000000e0 00400080 .............@..
0050 4a14f145 51882440 e0da89ea 3a9d5188 J..EQ.$@....:.Q.
0060 1d4000d8 89ea3a9d 51881d40 .@....:.Q..@
If I understand your hint correctly, the registers should be read as follows:
#define ELF_CORE_COPY_REGS(pr_reg, regs) \
pr_reg[0] = regs->ebx; \
pr_reg[1] = regs->ecx; \
pr_reg[2] = regs->edx; \
pr_reg[3] = regs->esi; \
pr_reg[4] = regs->edi; \
pr_reg[5] = regs->ebp; \
pr_reg[6] = regs->eax; \
pr_reg[7] = regs->xds; \
pr_reg[8] = regs->xes; \
savesegment(fs,pr_reg[9]); \
savesegment(gs,pr_reg[10]); \
pr_reg[11] = regs->orig_eax; \
pr_reg[12] = regs->eip; \
pr_reg[13] = regs->xcs; \
pr_reg[14] = regs->eflags; \
pr_reg[15] = regs->esp; \
pr_reg[16] = regs->xss;
This does seem to be the case, "info registers" output from gdb)
eax 0xfffffffc -4
ecx 0x5 5
edx 0xb7bb 47035
ebx 0x810ee68 135327336
esp 0xbffff71c 0xbffff71c
ebp 0xbffff728 0xbffff728
esi 0x0 0
edi 0x242f40 2371392
eip 0x1e0523 0x1e0523 <poll+131>
eflags 0x246 582
cs 0x73 115
ss 0x7b 123
ds 0xc010007b -1072693125
es 0x7b 123
fs 0x0 0
gs 0x33 51
However the values are bogus. The valid ebp value for the crashing thread is
0x0409272c
So it seems to be a kernel bug. Any hints where this was fixed or whether
it was fixed at all?
>
> > The funny part that the segfault
> > itself occurred in the PID number 31158 (not the main thread for sure),
> > but gdb lists pid 31158 as the main thread with the main thread's stack.
>
> The kernel always dumps the faulting thread first.
Sure, but it has the context of the main thread.
--
Bazsi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thread register state information invalid in core files
2006-03-28 21:18 ` Balazs Scheidler
@ 2006-03-28 22:01 ` Daniel Jacobowitz
2006-03-30 14:32 ` Balazs Scheidler
1 sibling, 0 replies; 5+ messages in thread
From: Daniel Jacobowitz @ 2006-03-28 22:01 UTC (permalink / raw)
To: Balazs Scheidler; +Cc: gdb
On Tue, Mar 28, 2006 at 05:20:22PM +0200, Balazs Scheidler wrote:
> So it seems to be a kernel bug. Any hints where this was fixed or whether
> it was fixed at all?
Sorry, I've no idea. If it still occurs in current kernel.org kernels,
you may want to report it to the kernel bugzilla.
> > > The funny part that the segfault
> > > itself occurred in the PID number 31158 (not the main thread for sure),
> > > but gdb lists pid 31158 as the main thread with the main thread's stack.
> >
> > The kernel always dumps the faulting thread first.
>
> Sure, but it has the context of the main thread.
Oh, I think I misunderstood you. You've got the main thread's
registers, but the faulting thread's PID? I have no idea how that
could happen!
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thread register state information invalid in core files
2006-03-28 21:18 ` Balazs Scheidler
2006-03-28 22:01 ` Daniel Jacobowitz
@ 2006-03-30 14:32 ` Balazs Scheidler
1 sibling, 0 replies; 5+ messages in thread
From: Balazs Scheidler @ 2006-03-30 14:32 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb
On Tue, 2006-03-28 at 17:20 +0200, Balazs Scheidler wrote:
> On Tue, 2006-03-28 at 09:36 -0500, Daniel Jacobowitz wrote:
> > On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> > > Anything else:
> However the values are bogus. The valid ebp value for the crashing thread is
> 0x0409272c
>
> So it seems to be a kernel bug. Any hints where this was fixed or whether
> it was fixed at all?
For the record, the problem was fixed with this patch, supposedly
included in linux 2.6.15:
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=557962a926c62a9c4bd79d6b36df873d4f8c51ef
Thanks for your hints on the problem.
--
Bazsi
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-03-30 10:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-28 14:36 thread register state information invalid in core files Balazs Scheidler
2006-03-28 15:31 ` Daniel Jacobowitz
2006-03-28 21:18 ` Balazs Scheidler
2006-03-28 22:01 ` Daniel Jacobowitz
2006-03-30 14:32 ` Balazs Scheidler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).