From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5755 invoked by alias); 16 Apr 2003 14:38:46 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 5744 invoked from network); 16 Apr 2003 14:38:46 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 16 Apr 2003 14:38:46 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id h3GEckD02767 for ; Wed, 16 Apr 2003 10:38:46 -0400 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [172.16.52.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3GEckq14870 for ; Wed, 16 Apr 2003 10:38:46 -0400 Received: from localhost.redhat.com (romulus-int.sfbay.redhat.com [172.16.27.46]) by pobox.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3GEcig22300; Wed, 16 Apr 2003 10:38:44 -0400 Received: by localhost.redhat.com (Postfix, from userid 469) id 3F6912C43E; Wed, 16 Apr 2003 10:43:12 -0400 (EDT) From: Elena Zannoni MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16029.27648.37989.683217@localhost.redhat.com> Date: Wed, 16 Apr 2003 14:38:00 -0000 To: Daniel Jacobowitz Cc: Elena Zannoni , gdb@sources.redhat.com, roland@redhat.com Subject: Re: Linux kernel problem -- food for thoughts In-Reply-To: <20030416142811.GA9574@nevyn.them.org> References: <16029.26499.985342.118733@localhost.redhat.com> <20030416142811.GA9574@nevyn.them.org> X-SW-Source: 2003-04/txt/msg00158.txt.bz2 Daniel Jacobowitz writes: > On Wed, Apr 16, 2003 at 10:24:03AM -0400, Elena Zannoni wrote: > > > > Gdb is currently having a 'little problem' backtracing out of system > > calls in x86 kernels which support NPTL. I think the current public > > 2.5 kernel would make this problem show up. > > > > Right now, if you are in system calls the backtrace will show up as: > > > > 0xffffe002 in ?? > > I was just thinking about this. My reaction is: > - the page needs to be readable; I vaguely remember badgering Linus > about this and getting it fixed, but it might have been someone else, > or it might not have gotten fixed. > - GDB needs to get the location of the EH information from glibc > somehow. My instinct is to make glibc export this in a global symbol, > just like the way we get signal numbers from linuxthreads. > > How does that sound? Roland (but I'll let him speak) has had a thought about creating a /proc/pid/vsyscall file, which then gdb could read with add-symbol-file.... the page is readable right now in 2.5 and the patch for the .eh_frame has been integrated. core files will also need to be addressed. > > > Note that we don't use eh information on i386 yet. We need to fix > that. I tried once and got distracted by another project, I think :) Yep, of course. elena > > > > > Here is an explanation of the problem that Roland has provided: > > > > --------------- > > Previously asm or C code in libc entered the kernel by setting some > > registers and using the "int $0x80" instruction. e.g. > > > > 00000000 <__getpid>: > > 0: b8 14 00 00 00 mov $0x14,%eax > > 5: cd 80 int $0x80 > > 7: c3 ret > > > > That is the function called __getpid in libc, the pre-NPTL build. (In the > > shared library you will see this if you've run with LD_ASSUME_KERNEL=2.4.1 > > so that /lib/i686/libc.so.6 is what you are using.) > > > > In the new libc (/lib/tls/libc.so.6), that function looks like this: > > > > 00000000 <__getpid>: > > 0: b8 14 00 00 00 mov $0x14,%eax > > 5: 65 ff 15 10 00 00 00 call *%gs:0x10 > > c: c3 ret > > > > %gs:0x10 is a location that has been initialized to a kernel-supplied > > special entry point address. In the current kernels, that address is > > always 0xffffe000. But that is not part of the ABI, which is why it's > > indirect instead of a literal "call 0xffffe000". The kernel supplies the > > actual entry point address to libc at startup time, and nothing in the > > kernel-user interface prevents it from using a different address in each > > process if it chose to. > > > > The reason for this is that there can be multiple ways to enter the kernel, > > not just the "int $0x80" trap instruction. Some kernels on some hardware > > may use a different method that performs better. By using this > > kernel-supplied entry point address, no user code has to be changed to > > select the method. It's entirely the kernel's choice. > > > > In all the RH kernels we have right now, the entry point page contains: > > > > 0xffffe000: int $0x80 > > 0xffffe002: ret > > > > But user code cannot presume what this code sequence looks like exactly. > > It will be some sequence of register and stack moves and special trap > > instructions, but you have to disassemble to know exactly. In the case > > above, the PC value seen while a thread is in the kernel is 0xffffe002. > > You can disassemble the "ret" there and see that you have to pop the PC off > > the stack to recover the caller's frame. > > > > Another example of what this code might look like when you disassemble it is: > > > > 0xffffe000: push %ecx > > 0xffffe001: push %edx > > 0xffffe002: push %ebp > > 0xffffe003: mov %esp,%ebp > > 0xffffe005: sysenter > > 0xffffe007: nop > > 0xffffe008: nop > > 0xffffe009: nop > > 0xffffe00a: nop > > 0xffffe00b: nop > > 0xffffe00c: nop > > 0xffffe00d: nop > > 0xffffe00e: jmp 0xffffe003 > > 0xffffe010: pop %ebp > > 0xffffe011: pop %edx > > 0xffffe012: pop %ecx > > 0xffffe013: ret > > > > In this example, depending on what happened inside the kernel the PC you > > usually see may be either 0xffffe00e or 0xffffe010. If the process gets a > > signal or you attach asynchronously or so forth, the PC might be at any of > > the earlier instructions as well. You cannot rely on exactly what the > > sequence is, so you must be able to disassemble from where you are and > > cope. In this case you will most often see 0xffffe010, in which case you > > need to pop those three registers and the PC off the stack to restore the > > caller's frame. > > > > So, these cases are like a leaf function with no debugging info. The > > first solution idea was interpreting the epilogue code. It will > > probably be safe to assume that it looks like epilogue code normally > > does, i.e. register pops and not any arbitrary instructions. > > > > Another solution I was considering is to have the system somewhere provide > > DWARF unwind info matching the possible PC addresses in the vsyscall page. > > I am now pretty sure this is the way to go. The recent development is that > > NPTL now needs .eh_frame information for these PCs as well, and Ulrich has > > made a kernel change to provide it. The .eh_frame info for the vsyscall > > PCs is on the same read-only kernel page. The C library now uses this as > > if the vsyscall page were a DSO with .eh_frame info to register, so that > > exception-style unwinding from any valid PC in a magic entry point works. > > > > So, there is a .eh_frame section available for this code, and getting it > > from where it is into gdb can be done by hook or by crook. I have the > > impression that gdb turning an available .eh_frame section into happy > > backtraces is something that might be expected real soon now. > > Sounds like a winner. > > > > I think that elucidates all but the dreariest bits of the technical issues. > > Now the practical questions. Oh, one dreary bit: 83172 mostly talks about > > the fact that ptrace refuses to read the 0xffffe000 page for you, which is > > presumed a prerequisite for dealing with the real can of worms (unwinding). > > > > -------------------- > > > > > > I think right now the public 2.5 kernel has a fix to make the page > > readable, and another one to provide the .eh_frame information. There > > is no mechanism yet to make that debug info accessible to gdb. > > > > > > elena > > > > -- > Daniel Jacobowitz > MontaVista Software Debian GNU/Linux Developer