From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24660-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 10167 invoked by alias); 28 Mar 2006 15:20:29 -0000
Received: (qmail 10158 invoked by uid 22791); 28 Mar 2006 15:20:28 -0000
X-Spam-Check-By: sourceware.org
Received: from balabit.hu (HELO balabit.hu) (195.70.34.196)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 28 Mar 2006 15:20:27 +0000
Subject: Re: thread register state information invalid in core files
From: Balazs Scheidler <bazsi@balabit.hu>
To: Daniel Jacobowitz <drow@false.org>
Cc: gdb@sourceware.org
In-Reply-To: <20060328143647.GB30581@nevyn.them.org>
References: <1143542626.8742.12.camel@bzorp.balabit> 	 <20060328143647.GB30581@nevyn.them.org>
Content-Type: text/plain
Date: Tue, 28 Mar 2006 21:18:00 -0000
Message-Id: <1143559222.16757.11.camel@bzorp.balabit>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-03/txt/msg00187.txt.bz2

On Tue, 2006-03-28 at 09:36 -0500, Daniel Jacobowitz wrote:
> On Tue, Mar 28, 2006 at 12:43:45PM +0200, Balazs Scheidler wrote:
> > Anything else:
> > (gdb) thread 2
> > [Switching to thread 2 (process 26119)]#0  0x00010202 in ?? ()
> > (gdb) bt
> > #0  0x00010202 in ?? ()
> > Cannot access memory at address 0x0
> > (gdb) info registers
> > eax            0xc010007b       -1072693125
> > ecx            0x243948 2373960
> > edx            0x0      0
> > ebx            0x1f8    504
> > esp            0x0      0x0
> > ebp            0x7b     0x7b
> > esi            0x409272c        67708716
> > edi            0x243900 2373888
> > eip            0x10202  0x10202
> > eflags         0x7b     123
> > cs             0x26f4   9972
> > ss             0x0      0
> > ds             0xffff   65535
> > es             0x3965   14693
> > fs             0x0      0
> > gs             0x33     51
> > 
> > Looking at the value of ESP and EBP it is possible that gdb incorrectly 
> > reads the stack-frame information.
> 
> It looks to me like the core file is just corrupt.
> 
> These registers are in the pseudo-sections you saw in objdump, in the
> order the header files describe for an elf_gregset_t.  You may want to
> check the core file by hand; you can dump the sections using objdump -s
> -j "sectionname".
> 
> I remember having various problems with threaded core dumps in recent
> kernels.

This is the content of .reg/31158 (same as .reg)

Contents of section .reg/31158:
 0000 68ee1008 05000000 bbb70000 00000000  h...............
 0010 402f2400 28f7ffbf fcffffff 7b0010c0  @/$.(.......{...
 0020 7b000000 00000000 33000000 a8000000  {.......3.......
 0030 23051e00 73000000 46020000 1cf7ffbf  #...s...F.......
 0040 7b000000                             {...

and .reg2/31158 (same as .reg2)

Contents of section .reg2/31158:
 0000 7f032000 0000c901 c8c41500 73000000  .. .........s...
 0010 9ce2ffbf 7b000000 801f0000 bd6f0200  ....{........o..
 0020 00000000 ffffffff 01000000 0000ffff  ................
 0030 af3fffff f5130000 ffff818a feffffffx  .?..............
 0040 0100ffff 00000000 000000e0 00400080  .............@..
 0050 4a14f145 51882440 e0da89ea 3a9d5188  J..EQ.$@....:.Q.
 0060 1d4000d8 89ea3a9d 51881d40           .@....:.Q..@

If I understand your hint correctly, the registers should be read as follows:

#define ELF_CORE_COPY_REGS(pr_reg, regs)                \
        pr_reg[0] = regs->ebx;                          \
        pr_reg[1] = regs->ecx;                          \
        pr_reg[2] = regs->edx;                          \
        pr_reg[3] = regs->esi;                          \
        pr_reg[4] = regs->edi;                          \
        pr_reg[5] = regs->ebp;                          \
        pr_reg[6] = regs->eax;                          \
        pr_reg[7] = regs->xds;                          \
        pr_reg[8] = regs->xes;                          \
        savesegment(fs,pr_reg[9]);                      \
        savesegment(gs,pr_reg[10]);                     \
        pr_reg[11] = regs->orig_eax;                    \
        pr_reg[12] = regs->eip;                         \
        pr_reg[13] = regs->xcs;                         \
        pr_reg[14] = regs->eflags;                      \
        pr_reg[15] = regs->esp;                         \
        pr_reg[16] = regs->xss;

This does seem to be the case, "info registers" output from gdb)

eax            0xfffffffc       -4
ecx            0x5      5
edx            0xb7bb   47035
ebx            0x810ee68        135327336
esp            0xbffff71c       0xbffff71c
ebp            0xbffff728       0xbffff728
esi            0x0      0
edi            0x242f40 2371392
eip            0x1e0523 0x1e0523 <poll+131>
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0xc010007b       -1072693125
es             0x7b     123
fs             0x0      0
gs             0x33     51

However the values are bogus. The valid ebp value for the crashing thread is 
0x0409272c

So it seems to be a kernel bug. Any hints where this was fixed or whether 
it was fixed at all?

> 
> > The funny part that the segfault
> > itself occurred in the PID number 31158 (not the main thread for sure),
> > but gdb lists pid 31158 as the main thread with the main thread's stack.
> 
> The kernel always dumps the faulting thread first.

Sure, but it has the context of the main thread.

-- 
Bazsi