From: Daniel Jacobowitz <drow@mvista.com>
To: Elena Zannoni <ezannoni@redhat.com>
Cc: gdb@sources.redhat.com, roland@redhat.com
Subject: Re: Linux kernel problem -- food for thoughts
Date: Wed, 16 Apr 2003 14:42:00 -0000 [thread overview]
Message-ID: <20030416144158.GA10060@nevyn.them.org> (raw)
In-Reply-To: <16029.27648.37989.683217@localhost.redhat.com>
On Wed, Apr 16, 2003 at 10:43:12AM -0400, Elena Zannoni wrote:
> Daniel Jacobowitz writes:
> > On Wed, Apr 16, 2003 at 10:24:03AM -0400, Elena Zannoni wrote:
> > >
> > > Gdb is currently having a 'little problem' backtracing out of system
> > > calls in x86 kernels which support NPTL. I think the current public
> > > 2.5 kernel would make this problem show up.
> > >
> > > Right now, if you are in system calls the backtrace will show up as:
> > >
> > > 0xffffe002 in ??
> >
> > I was just thinking about this. My reaction is:
> > - the page needs to be readable; I vaguely remember badgering Linus
> > about this and getting it fixed, but it might have been someone else,
> > or it might not have gotten fixed.
> > - GDB needs to get the location of the EH information from glibc
> > somehow. My instinct is to make glibc export this in a global symbol,
> > just like the way we get signal numbers from linuxthreads.
> >
> > How does that sound?
>
> Roland (but I'll let him speak) has had a thought about creating a
> /proc/pid/vsyscall file, which then gdb could read with add-symbol-file....
>
> the page is readable right now in 2.5 and the patch for the .eh_frame
> has been integrated.
>
> core files will also need to be addressed.
Oww. Should we just include the page in core dumps? That might be
the simplest solution.
Roland, I think that doing it in glibc is a better idea than doing it
in /proc somewhere; it will make remote debugging and core debugging
more straightforward.
> > Note that we don't use eh information on i386 yet. We need to fix
> > that. I tried once and got distracted by another project, I think :)
>
> Yep, of course.
>
> elena
>
>
> >
> > >
> > > Here is an explanation of the problem that Roland has provided:
> > >
> > > ---------------
> > > Previously asm or C code in libc entered the kernel by setting some
> > > registers and using the "int $0x80" instruction. e.g.
> > >
> > > 00000000 <__getpid>:
> > > 0: b8 14 00 00 00 mov $0x14,%eax
> > > 5: cd 80 int $0x80
> > > 7: c3 ret
> > >
> > > That is the function called __getpid in libc, the pre-NPTL build. (In the
> > > shared library you will see this if you've run with LD_ASSUME_KERNEL=2.4.1
> > > so that /lib/i686/libc.so.6 is what you are using.)
> > >
> > > In the new libc (/lib/tls/libc.so.6), that function looks like this:
> > >
> > > 00000000 <__getpid>:
> > > 0: b8 14 00 00 00 mov $0x14,%eax
> > > 5: 65 ff 15 10 00 00 00 call *%gs:0x10
> > > c: c3 ret
> > >
> > > %gs:0x10 is a location that has been initialized to a kernel-supplied
> > > special entry point address. In the current kernels, that address is
> > > always 0xffffe000. But that is not part of the ABI, which is why it's
> > > indirect instead of a literal "call 0xffffe000". The kernel supplies the
> > > actual entry point address to libc at startup time, and nothing in the
> > > kernel-user interface prevents it from using a different address in each
> > > process if it chose to.
> > >
> > > The reason for this is that there can be multiple ways to enter the kernel,
> > > not just the "int $0x80" trap instruction. Some kernels on some hardware
> > > may use a different method that performs better. By using this
> > > kernel-supplied entry point address, no user code has to be changed to
> > > select the method. It's entirely the kernel's choice.
> > >
> > > In all the RH kernels we have right now, the entry point page contains:
> > >
> > > 0xffffe000: int $0x80
> > > 0xffffe002: ret
> > >
> > > But user code cannot presume what this code sequence looks like exactly.
> > > It will be some sequence of register and stack moves and special trap
> > > instructions, but you have to disassemble to know exactly. In the case
> > > above, the PC value seen while a thread is in the kernel is 0xffffe002.
> > > You can disassemble the "ret" there and see that you have to pop the PC off
> > > the stack to recover the caller's frame.
> > >
> > > Another example of what this code might look like when you disassemble it is:
> > >
> > > 0xffffe000: push %ecx
> > > 0xffffe001: push %edx
> > > 0xffffe002: push %ebp
> > > 0xffffe003: mov %esp,%ebp
> > > 0xffffe005: sysenter
> > > 0xffffe007: nop
> > > 0xffffe008: nop
> > > 0xffffe009: nop
> > > 0xffffe00a: nop
> > > 0xffffe00b: nop
> > > 0xffffe00c: nop
> > > 0xffffe00d: nop
> > > 0xffffe00e: jmp 0xffffe003
> > > 0xffffe010: pop %ebp
> > > 0xffffe011: pop %edx
> > > 0xffffe012: pop %ecx
> > > 0xffffe013: ret
> > >
> > > In this example, depending on what happened inside the kernel the PC you
> > > usually see may be either 0xffffe00e or 0xffffe010. If the process gets a
> > > signal or you attach asynchronously or so forth, the PC might be at any of
> > > the earlier instructions as well. You cannot rely on exactly what the
> > > sequence is, so you must be able to disassemble from where you are and
> > > cope. In this case you will most often see 0xffffe010, in which case you
> > > need to pop those three registers and the PC off the stack to restore the
> > > caller's frame.
> > >
> > > So, these cases are like a leaf function with no debugging info. The
> > > first solution idea was interpreting the epilogue code. It will
> > > probably be safe to assume that it looks like epilogue code normally
> > > does, i.e. register pops and not any arbitrary instructions.
> > >
> > > Another solution I was considering is to have the system somewhere provide
> > > DWARF unwind info matching the possible PC addresses in the vsyscall page.
> > > I am now pretty sure this is the way to go. The recent development is that
> > > NPTL now needs .eh_frame information for these PCs as well, and Ulrich has
> > > made a kernel change to provide it. The .eh_frame info for the vsyscall
> > > PCs is on the same read-only kernel page. The C library now uses this as
> > > if the vsyscall page were a DSO with .eh_frame info to register, so that
> > > exception-style unwinding from any valid PC in a magic entry point works.
> > >
> > > So, there is a .eh_frame section available for this code, and getting it
> > > from where it is into gdb can be done by hook or by crook. I have the
> > > impression that gdb turning an available .eh_frame section into happy
> > > backtraces is something that might be expected real soon now.
> > > Sounds like a winner.
> > >
> > > I think that elucidates all but the dreariest bits of the technical issues.
> > > Now the practical questions. Oh, one dreary bit: 83172 mostly talks about
> > > the fact that ptrace refuses to read the 0xffffe000 page for you, which is
> > > presumed a prerequisite for dealing with the real can of worms (unwinding).
> > >
> > > --------------------
> > >
> > >
> > > I think right now the public 2.5 kernel has a fix to make the page
> > > readable, and another one to provide the .eh_frame information. There
> > > is no mechanism yet to make that debug info accessible to gdb.
> > >
> > >
> > > elena
> > >
> >
> > --
> > Daniel Jacobowitz
> > MontaVista Software Debian GNU/Linux Developer
>
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
next prev parent reply other threads:[~2003-04-16 14:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-16 14:19 Elena Zannoni
2003-04-16 14:28 ` Daniel Jacobowitz
2003-04-16 14:38 ` Elena Zannoni
2003-04-16 14:42 ` Daniel Jacobowitz [this message]
2003-05-22 22:24 ` disassembly in gdb? Kumar Gala
2003-05-22 22:53 ` Kevin Buettner
2003-05-22 23:15 ` Kumar Gala
2003-05-22 23:52 ` Kevin Buettner
2003-05-23 15:15 ` Andrew Cagney
2003-04-16 21:04 ` Linux kernel problem -- food for thoughts Roland McGrath
2003-04-16 20:51 ` Roland McGrath
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030416144158.GA10060@nevyn.them.org \
--to=drow@mvista.com \
--cc=ezannoni@redhat.com \
--cc=gdb@sources.redhat.com \
--cc=roland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).