From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9446 invoked by alias); 16 Apr 2003 14:42:02 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 9431 invoked from network); 16 Apr 2003 14:42:01 -0000 Received: from unknown (HELO crack.them.org) (65.125.64.184) by sources.redhat.com with SMTP; 16 Apr 2003 14:42:01 -0000 Received: from nevyn.them.org ([66.93.61.169] ident=mail) by crack.them.org with asmtp (Exim 3.12 #1 (Debian)) id 195o70-0008GL-00; Wed, 16 Apr 2003 09:42:10 -0500 Received: from drow by nevyn.them.org with local (Exim 3.36 #1 (Debian)) id 195o6o-0002cb-00; Wed, 16 Apr 2003 10:41:58 -0400 Date: Wed, 16 Apr 2003 14:42:00 -0000 From: Daniel Jacobowitz To: Elena Zannoni Cc: gdb@sources.redhat.com, roland@redhat.com Subject: Re: Linux kernel problem -- food for thoughts Message-ID: <20030416144158.GA10060@nevyn.them.org> Mail-Followup-To: Elena Zannoni , gdb@sources.redhat.com, roland@redhat.com References: <16029.26499.985342.118733@localhost.redhat.com> <20030416142811.GA9574@nevyn.them.org> <16029.27648.37989.683217@localhost.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16029.27648.37989.683217@localhost.redhat.com> User-Agent: Mutt/1.5.1i X-SW-Source: 2003-04/txt/msg00159.txt.bz2 On Wed, Apr 16, 2003 at 10:43:12AM -0400, Elena Zannoni wrote: > Daniel Jacobowitz writes: > > On Wed, Apr 16, 2003 at 10:24:03AM -0400, Elena Zannoni wrote: > > > > > > Gdb is currently having a 'little problem' backtracing out of system > > > calls in x86 kernels which support NPTL. I think the current public > > > 2.5 kernel would make this problem show up. > > > > > > Right now, if you are in system calls the backtrace will show up as: > > > > > > 0xffffe002 in ?? > > > > I was just thinking about this. My reaction is: > > - the page needs to be readable; I vaguely remember badgering Linus > > about this and getting it fixed, but it might have been someone else, > > or it might not have gotten fixed. > > - GDB needs to get the location of the EH information from glibc > > somehow. My instinct is to make glibc export this in a global symbol, > > just like the way we get signal numbers from linuxthreads. > > > > How does that sound? > > Roland (but I'll let him speak) has had a thought about creating a > /proc/pid/vsyscall file, which then gdb could read with add-symbol-file.... > > the page is readable right now in 2.5 and the patch for the .eh_frame > has been integrated. > > core files will also need to be addressed. Oww. Should we just include the page in core dumps? That might be the simplest solution. Roland, I think that doing it in glibc is a better idea than doing it in /proc somewhere; it will make remote debugging and core debugging more straightforward. > > Note that we don't use eh information on i386 yet. We need to fix > > that. I tried once and got distracted by another project, I think :) > > Yep, of course. > > elena > > > > > > > > > > Here is an explanation of the problem that Roland has provided: > > > > > > --------------- > > > Previously asm or C code in libc entered the kernel by setting some > > > registers and using the "int $0x80" instruction. e.g. > > > > > > 00000000 <__getpid>: > > > 0: b8 14 00 00 00 mov $0x14,%eax > > > 5: cd 80 int $0x80 > > > 7: c3 ret > > > > > > That is the function called __getpid in libc, the pre-NPTL build. (In the > > > shared library you will see this if you've run with LD_ASSUME_KERNEL=2.4.1 > > > so that /lib/i686/libc.so.6 is what you are using.) > > > > > > In the new libc (/lib/tls/libc.so.6), that function looks like this: > > > > > > 00000000 <__getpid>: > > > 0: b8 14 00 00 00 mov $0x14,%eax > > > 5: 65 ff 15 10 00 00 00 call *%gs:0x10 > > > c: c3 ret > > > > > > %gs:0x10 is a location that has been initialized to a kernel-supplied > > > special entry point address. In the current kernels, that address is > > > always 0xffffe000. But that is not part of the ABI, which is why it's > > > indirect instead of a literal "call 0xffffe000". The kernel supplies the > > > actual entry point address to libc at startup time, and nothing in the > > > kernel-user interface prevents it from using a different address in each > > > process if it chose to. > > > > > > The reason for this is that there can be multiple ways to enter the kernel, > > > not just the "int $0x80" trap instruction. Some kernels on some hardware > > > may use a different method that performs better. By using this > > > kernel-supplied entry point address, no user code has to be changed to > > > select the method. It's entirely the kernel's choice. > > > > > > In all the RH kernels we have right now, the entry point page contains: > > > > > > 0xffffe000: int $0x80 > > > 0xffffe002: ret > > > > > > But user code cannot presume what this code sequence looks like exactly. > > > It will be some sequence of register and stack moves and special trap > > > instructions, but you have to disassemble to know exactly. In the case > > > above, the PC value seen while a thread is in the kernel is 0xffffe002. > > > You can disassemble the "ret" there and see that you have to pop the PC off > > > the stack to recover the caller's frame. > > > > > > Another example of what this code might look like when you disassemble it is: > > > > > > 0xffffe000: push %ecx > > > 0xffffe001: push %edx > > > 0xffffe002: push %ebp > > > 0xffffe003: mov %esp,%ebp > > > 0xffffe005: sysenter > > > 0xffffe007: nop > > > 0xffffe008: nop > > > 0xffffe009: nop > > > 0xffffe00a: nop > > > 0xffffe00b: nop > > > 0xffffe00c: nop > > > 0xffffe00d: nop > > > 0xffffe00e: jmp 0xffffe003 > > > 0xffffe010: pop %ebp > > > 0xffffe011: pop %edx > > > 0xffffe012: pop %ecx > > > 0xffffe013: ret > > > > > > In this example, depending on what happened inside the kernel the PC you > > > usually see may be either 0xffffe00e or 0xffffe010. If the process gets a > > > signal or you attach asynchronously or so forth, the PC might be at any of > > > the earlier instructions as well. You cannot rely on exactly what the > > > sequence is, so you must be able to disassemble from where you are and > > > cope. In this case you will most often see 0xffffe010, in which case you > > > need to pop those three registers and the PC off the stack to restore the > > > caller's frame. > > > > > > So, these cases are like a leaf function with no debugging info. The > > > first solution idea was interpreting the epilogue code. It will > > > probably be safe to assume that it looks like epilogue code normally > > > does, i.e. register pops and not any arbitrary instructions. > > > > > > Another solution I was considering is to have the system somewhere provide > > > DWARF unwind info matching the possible PC addresses in the vsyscall page. > > > I am now pretty sure this is the way to go. The recent development is that > > > NPTL now needs .eh_frame information for these PCs as well, and Ulrich has > > > made a kernel change to provide it. The .eh_frame info for the vsyscall > > > PCs is on the same read-only kernel page. The C library now uses this as > > > if the vsyscall page were a DSO with .eh_frame info to register, so that > > > exception-style unwinding from any valid PC in a magic entry point works. > > > > > > So, there is a .eh_frame section available for this code, and getting it > > > from where it is into gdb can be done by hook or by crook. I have the > > > impression that gdb turning an available .eh_frame section into happy > > > backtraces is something that might be expected real soon now. > > > Sounds like a winner. > > > > > > I think that elucidates all but the dreariest bits of the technical issues. > > > Now the practical questions. Oh, one dreary bit: 83172 mostly talks about > > > the fact that ptrace refuses to read the 0xffffe000 page for you, which is > > > presumed a prerequisite for dealing with the real can of worms (unwinding). > > > > > > -------------------- > > > > > > > > > I think right now the public 2.5 kernel has a fix to make the page > > > readable, and another one to provide the .eh_frame information. There > > > is no mechanism yet to make that debug info accessible to gdb. > > > > > > > > > elena > > > > > > > -- > > Daniel Jacobowitz > > MontaVista Software Debian GNU/Linux Developer > -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer