notify_page_fault() problem

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

From: Quentin Barnes <qbarnes@urbana.css.mot.com>
To: systemtap@sources.redhat.com
Subject: notify_page_fault() problem
Date: Mon, 30 Apr 2007 20:19:00 -0000	[thread overview]
Message-ID: <20070430201931.GA7328@urbana.css.mot.com> (raw)

In testing the ARM Kprobes code, I hit a bug where the kernel would
occasionally lock up.  I tracked this down to the page fault handling
code.

Once I had a grasp of how the code is working, I'm wondering why other
Kprobes implementations aren't hitting the same bug.

On i386, I see they have a fix or workaround for this issue, but in
skimming the code base I don't see something like it on any other
Kprobes supporting platform.

The manifestation of the problem is that the kernel can get stuck
in an infinite loop with repeating page faults until it stack
overruns.  I understand how and why it is doing it, but exactly what
is considered the best fix is unclear to me since I'm not a Linux
kernel expert.

The start of the problem is that the data space for Kprobes modules
comes out of the module_alloc'd memory region.  Like vmalloc'd
memory this memory can cause an MMU fault, which is as expected.
Like vmalloc'd memory, the kernel fills in the hardware PTEs for its
MMU for module_alloc'd memory on an on-demand basis.

The problem is that the call to notify_page_fault() is right up
front in most architecture's data fault handling code.  The function
notify_page_fault() and its registered chain routines can be invoked
_before_ the kernel's had a chance to fill in its mapping.  See the
problem yet?  The MMU generates a fault, the arch-specific fault
handler is called, it immediately calls notify_page_fault() which
calls kprobe_exceptions_notify() which calls kprobe_fault_handler()
which calls get_kprobe().  The function get_kprobe() dereferences
a kprobe data structure which the structure isn't always mapped in
hardware yet, so it faults.  Now we re-start the entire call
sequence again, and then again, and again, and again until the
system falls over.

Now on i386's do_page_fault(), it avoids the above infinite
recursion by checking to see if the fault happened in kernel space.
If it did, it makes sure it was the right kind of fault, and, if
it is, it calls vmalloc_fault() which will attempt to map the missing
page translation with vmalloc_sync_one().  If it is successful,
do_page_fault() returns without letting notify_page_fault() get
called.

This support code on i386 (vmalloc_fault() and vmalloc_sync_one())
is in its architecture-specific layer.  If this is considered
the best fix for this issue (instead of moving the call, more
below), rather than everyone go off and implement it themselves
in their own layer, should something like it be implemented in
architecture-independent code?

One thing I don't understand is why notify_page_fault() is called
so early in everyone's page fault handling code.  Shouldn't it be
called just before the kernel's about to give up, instead of right
up front?

Page fault handling code is a serious critical path in any kernel.
As it is by putting the notify_page_fault() call so early, just
having the check there, whether or not any hooks are registered,
has to have a detrimental impact on overall system performance.  If
it is treated as an option of last resort like other 'die' chain
services, this wouldn't have been a problem in the first place.  Why
did someone find it important for the check not to go last?

For now to workaround this bug for the ARM Kprobes port, I just
removed the call to notify_page_fault() for kernel translation
faults.  I wanted to hear some comments on this problem before
working on a fix.

Quentin

next             reply	other threads:[~2007-04-30 20:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-30 20:19 Quentin Barnes [this message]
2007-04-30 20:25 ` Andi Kleen
2007-04-30 21:15   ` Quentin Barnes
2007-05-01  2:57     ` Andi Kleen
2007-05-01  3:24       ` Quentin Barnes
2007-05-01 15:40         ` Frank Ch. Eigler
2007-05-01 15:43           ` Quentin Barnes
2007-05-02 10:33           ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070430201931.GA7328@urbana.css.mot.com \
    --to=qbarnes@urbana.css.mot.com \
    --cc=systemtap@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).