From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24083 invoked by alias); 16 Apr 2003 14:19:38 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 24076 invoked from network); 16 Apr 2003 14:19:37 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 16 Apr 2003 14:19:37 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id h3GEJbD29389 for ; Wed, 16 Apr 2003 10:19:37 -0400 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [172.16.52.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3GEJbq07638 for ; Wed, 16 Apr 2003 10:19:37 -0400 Received: from localhost.redhat.com (romulus-int.sfbay.redhat.com [172.16.27.46]) by pobox.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3GEJag20077; Wed, 16 Apr 2003 10:19:36 -0400 Received: by localhost.redhat.com (Postfix, from userid 469) id 1F2602C43F; Wed, 16 Apr 2003 10:24:04 -0400 (EDT) From: Elena Zannoni MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16029.26499.985342.118733@localhost.redhat.com> Date: Wed, 16 Apr 2003 14:19:00 -0000 To: gdb@sources.redhat.com Cc: roland@redhat.com, drow@mvista.com Subject: Linux kernel problem -- food for thoughts X-SW-Source: 2003-04/txt/msg00153.txt.bz2 Gdb is currently having a 'little problem' backtracing out of system calls in x86 kernels which support NPTL. I think the current public 2.5 kernel would make this problem show up. Right now, if you are in system calls the backtrace will show up as: 0xffffe002 in ?? Here is an explanation of the problem that Roland has provided: --------------- Previously asm or C code in libc entered the kernel by setting some registers and using the "int $0x80" instruction. e.g. 00000000 <__getpid>: 0: b8 14 00 00 00 mov $0x14,%eax 5: cd 80 int $0x80 7: c3 ret That is the function called __getpid in libc, the pre-NPTL build. (In the shared library you will see this if you've run with LD_ASSUME_KERNEL=2.4.1 so that /lib/i686/libc.so.6 is what you are using.) In the new libc (/lib/tls/libc.so.6), that function looks like this: 00000000 <__getpid>: 0: b8 14 00 00 00 mov $0x14,%eax 5: 65 ff 15 10 00 00 00 call *%gs:0x10 c: c3 ret %gs:0x10 is a location that has been initialized to a kernel-supplied special entry point address. In the current kernels, that address is always 0xffffe000. But that is not part of the ABI, which is why it's indirect instead of a literal "call 0xffffe000". The kernel supplies the actual entry point address to libc at startup time, and nothing in the kernel-user interface prevents it from using a different address in each process if it chose to. The reason for this is that there can be multiple ways to enter the kernel, not just the "int $0x80" trap instruction. Some kernels on some hardware may use a different method that performs better. By using this kernel-supplied entry point address, no user code has to be changed to select the method. It's entirely the kernel's choice. In all the RH kernels we have right now, the entry point page contains: 0xffffe000: int $0x80 0xffffe002: ret But user code cannot presume what this code sequence looks like exactly. It will be some sequence of register and stack moves and special trap instructions, but you have to disassemble to know exactly. In the case above, the PC value seen while a thread is in the kernel is 0xffffe002. You can disassemble the "ret" there and see that you have to pop the PC off the stack to recover the caller's frame. Another example of what this code might look like when you disassemble it is: 0xffffe000: push %ecx 0xffffe001: push %edx 0xffffe002: push %ebp 0xffffe003: mov %esp,%ebp 0xffffe005: sysenter 0xffffe007: nop 0xffffe008: nop 0xffffe009: nop 0xffffe00a: nop 0xffffe00b: nop 0xffffe00c: nop 0xffffe00d: nop 0xffffe00e: jmp 0xffffe003 0xffffe010: pop %ebp 0xffffe011: pop %edx 0xffffe012: pop %ecx 0xffffe013: ret In this example, depending on what happened inside the kernel the PC you usually see may be either 0xffffe00e or 0xffffe010. If the process gets a signal or you attach asynchronously or so forth, the PC might be at any of the earlier instructions as well. You cannot rely on exactly what the sequence is, so you must be able to disassemble from where you are and cope. In this case you will most often see 0xffffe010, in which case you need to pop those three registers and the PC off the stack to restore the caller's frame. So, these cases are like a leaf function with no debugging info. The first solution idea was interpreting the epilogue code. It will probably be safe to assume that it looks like epilogue code normally does, i.e. register pops and not any arbitrary instructions. Another solution I was considering is to have the system somewhere provide DWARF unwind info matching the possible PC addresses in the vsyscall page. I am now pretty sure this is the way to go. The recent development is that NPTL now needs .eh_frame information for these PCs as well, and Ulrich has made a kernel change to provide it. The .eh_frame info for the vsyscall PCs is on the same read-only kernel page. The C library now uses this as if the vsyscall page were a DSO with .eh_frame info to register, so that exception-style unwinding from any valid PC in a magic entry point works. So, there is a .eh_frame section available for this code, and getting it from where it is into gdb can be done by hook or by crook. I have the impression that gdb turning an available .eh_frame section into happy backtraces is something that might be expected real soon now. Sounds like a winner. I think that elucidates all but the dreariest bits of the technical issues. Now the practical questions. Oh, one dreary bit: 83172 mostly talks about the fact that ptrace refuses to read the 0xffffe000 page for you, which is presumed a prerequisite for dealing with the real can of worms (unwinding). -------------------- I think right now the public 2.5 kernel has a fix to make the page readable, and another one to provide the .eh_frame information. There is no mechanism yet to make that debug info accessible to gdb. elena