public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* kprobe fault handling
@ 2005-09-06  1:06 Frank Ch. Eigler
  0 siblings, 0 replies; 33+ messages in thread
From: Frank Ch. Eigler @ 2005-09-06  1:06 UTC (permalink / raw)
  To: systemtap

Hi -

I'm about to commit some translator code to set a fault_handler for
systemtap kprobes, as additional safety protection.  To test it, I've
tried the following little crashme script, which as you see
dereferences NULL.

  function ta () %{ * (volatile char *) 0 = 0; %}
  probe kernel.SOMEPLACE { ta () }

I'm getting far enough along with this that the fault handler is
indeed run, and a nice little message is sent via printk().  However,
what I'd like to do is signal that the probe handler should more or
less resume (and soon unwind and clean itself up with an error).

A "return 0" seems inappropriate, since this would trigger a kernel
panic.  A "return 1" seems not to work the way I expect, in that in my
tests, this gets us into an infinite loop, as if it was retrying the
NULL dereference every time.

What can I do from the fault handler to make sure that pre_handler
execution is resumed, but the faulting instruction is *skipped*?
Actually I don't even need the pre_handler to resume running, just the
kernel in general (single-stepping over the probed insn, and so on).
Is there a way to accomplish that?

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-12  1:26                                       ` Martin Hunt
@ 2006-02-13 13:39                                         ` Frank Ch. Eigler
  0 siblings, 0 replies; 33+ messages in thread
From: Frank Ch. Eigler @ 2006-02-13 13:39 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Hi -

hunt wrote:

> > Why was that news, given my repeated warning to this effect?
>
> The problem with your warnings is that they are vague and lack any
> analysis or data.  Saying that there may be bugs now or in the
> future in a system is not the same as reporting a specific bug.

And yet, that is what code reviewers do.  There is no obligation to
produce a complete bug report.  A well-founded concern (your call of a
generally sleep-capable function, deliberately bypassing of the
might_sleep warning, offering little justification and testing, and
until recently, no analysis) should have been enough.  Had you taken
my early warnings more seriously, the problem may have been solved
already.


> [...]  And now we're off on a completely different course. [...]
> But not in the copy functions which don't sleep, lock, or
> reschedule.

Maybe you are missing the fact that consequences of a normal fault can
*include* sleeping, locking, rescheduling.


> > [...] How much analysis went into your variant, beyond bypassing
> > the might_sleep warning?
>
> Certainly less time than I have spent arguing over it.  I writing a
> quick summary of my analysis and posting it separately. [...]

Thank you for doing the analysis now.  It looks plausible, though will
need more testing and may veer during the ongoing discussion regarding
improved kprobes fault handling.


- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-11  0:49                                     ` Frank Ch. Eigler
@ 2006-02-12  1:26                                       ` Martin Hunt
  2006-02-13 13:39                                         ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-12  1:26 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, 2006-02-10 at 19:49 -0500, Frank Ch. Eigler wrote: 
> Hi -
> 
> > > Those kernel functions are similarly unsafe (for purposes of
> > > systemtap), since they can sleep (wait while page faults are being
> > > serviced).  
> >
> > I started this whole thread to explain that my tests were now
> > showing that was indeed the case.
> 
> Why was that news, given my repeated warning to this effect?

The problem with your warnings is that they are vague and lack any
analysis or data.  Saying that there may be bugs now or in the future in
a system is not the same as reporting a specific bug.

> > However that was due to an easily fixed bug in the fault handler.
> 
> Perhaps so, but:
> 
> > You can't deem high-level functions unsafe to use because a bug in a
> > lower-level routine temporarily made them that way.
> 
> Temporarily?  And it's not just that routine.  The larger problem is
> sleeping/rescheduling/locking, not just faulting.  This lesson made an
> earlier appearance with printk.

And now we're off on a completely different course. You have bad
feelings that there are bugs involving sleeping/rescheduling/locking
somewhere. I agree there could be a few we haven't detected; I haven't
finished reviewing the code for them. But not in the copy functions
which don't sleep, lock, or reschedule.

> > > This is why Roland went out of his way to collect
> > > alternatives in loc2c-runtime.h.  This was explained at the time.
> > 
> > IIRC, he explained to you why using __get_user_asm was safe. That is the
> > same function used by copy_from_user and get_user. 
> 
> It may be that even those are not sufficiently safe (i.e., not
> stressed enough on pessimistic cases such as valid user addresses that
> are paged out).  Or maybe they are used just differently enough to
> have made them work.  How much analysis went into your variant, beyond
> bypassing the might_sleep warning?

Certainly less time than I have spent arguing over it.  I writing a
quick summary of my analysis and posting it separately. We can continue
this debate there if you wish. 

Martin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 23:36                                   ` Martin Hunt
@ 2006-02-11  0:49                                     ` Frank Ch. Eigler
  2006-02-12  1:26                                       ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Ch. Eigler @ 2006-02-11  0:49 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Hi -

> > Those kernel functions are similarly unsafe (for purposes of
> > systemtap), since they can sleep (wait while page faults are being
> > serviced).  
>
> I started this whole thread to explain that my tests were now
> showing that was indeed the case.

Why was that news, given my repeated warning to this effect?

> However that was due to an easily fixed bug in the fault handler.

Perhaps so, but:

> You can't deem high-level functions unsafe to use because a bug in a
> lower-level routine temporarily made them that way.

Temporarily?  And it's not just that routine.  The larger problem is
sleeping/rescheduling/locking, not just faulting.  This lesson made an
earlier appearance with printk.

> > This is why Roland went out of his way to collect
> > alternatives in loc2c-runtime.h.  This was explained at the time.
> 
> IIRC, he explained to you why using __get_user_asm was safe. That is the
> same function used by copy_from_user and get_user. 

It may be that even those are not sufficiently safe (i.e., not
stressed enough on pessimistic cases such as valid user addresses that
are paged out).  Or maybe they are used just differently enough to
have made them work.  How much analysis went into your variant, beyond
bypassing the might_sleep warning?

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 22:47                                 ` Frank Ch. Eigler
@ 2006-02-10 23:36                                   ` Martin Hunt
  2006-02-11  0:49                                     ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-10 23:36 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, 2006-02-10 at 17:47 -0500, Frank Ch. Eigler wrote:

> > What makes the __copy_from_user functions more unsafe than the
> > strncpy_from_user or get_user functions?  Or are they all unsafe?
> 
> Those kernel functions are similarly unsafe (for purposes of
> systemtap), since they can sleep (wait while page faults are being
> serviced).  

I started this whole thread to explain that my tests were now showing
that was indeed the case. However that was due to an easily fixed bug in
the fault handler. You can't deem high-level functions unsafe to use
because a bug in a lower-level routine temporarily made them that way.

> This is why Roland went out of his way to collect
> alternatives in loc2c-runtime.h.  This was explained at the time.

IIRC, he explained to you why using __get_user_asm was safe. That is the
same function used by copy_from_user and get_user. 

Martin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 22:41                               ` Martin Hunt
@ 2006-02-10 22:47                                 ` Frank Ch. Eigler
  2006-02-10 23:36                                   ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Ch. Eigler @ 2006-02-10 22:47 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Hi -

On Fri, Feb 10, 2006 at 02:42:23PM -0800, Martin Hunt wrote:
> > Er, the "__copy_from_user" family?!  The "_inatomic" function, whose
> > name encodes its very inatomicity?
> 
> Don't read too much into a name.

What led you to read too little into it?

> What makes the __copy_from_user functions more unsafe than the
> strncpy_from_user or get_user functions?  Or are they all unsafe?

Those kernel functions are similarly unsafe (for purposes of
systemtap), since they can sleep (wait while page faults are being
serviced).  This is why Roland went out of his way to collect
alternatives in loc2c-runtime.h.  This was explained at the time.

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 22:20                             ` Frank Ch. Eigler
@ 2006-02-10 22:41                               ` Martin Hunt
  2006-02-10 22:47                                 ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-10 22:41 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, 2006-02-10 at 17:20 -0500, Frank Ch. Eigler wrote:
> Hi -
> 
> > > Always for the same reason: because it invokes functions that can
> > > might sleep (and are in fact documented as such).  
> > 
> > Which function are those?
> 
> Er, the "__copy_from_user" family?!  The "_inatomic" function, whose
> name encodes its very inatomicity?

Don't read too much into a name.

What makes the __copy_from_user functions more unsafe than the
strncpy_from_user or get_user functions?  Or are they all unsafe?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 22:17                           ` Martin Hunt
@ 2006-02-10 22:20                             ` Frank Ch. Eigler
  2006-02-10 22:41                               ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Ch. Eigler @ 2006-02-10 22:20 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Hi -

> > Always for the same reason: because it invokes functions that can
> > might sleep (and are in fact documented as such).  
> 
> Which function are those?

Er, the "__copy_from_user" family?!  The "_inatomic" function, whose
name encodes its very inatomicity?

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 22:12                         ` Frank Ch. Eigler
@ 2006-02-10 22:17                           ` Martin Hunt
  2006-02-10 22:20                             ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-10 22:17 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, 2006-02-10 at 17:12 -0500, Frank Ch. Eigler wrote:
> Hi -
> 
> > > As mentioned several times, this runtime routine "_stp_copy_from_user"
> > > is not safe, and should be rewritten or removed.
> > 
> > Yes, you always say this. Please tell us why that is so.
> 
> Always for the same reason: because it invokes functions that can
> might sleep (and are in fact documented as such).  

Which function are those?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 21:55                       ` Martin Hunt
@ 2006-02-10 22:12                         ` Frank Ch. Eigler
  2006-02-10 22:17                           ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Ch. Eigler @ 2006-02-10 22:12 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Hi -

> > As mentioned several times, this runtime routine "_stp_copy_from_user"
> > is not safe, and should be rewritten or removed.
> 
> Yes, you always say this. Please tell us why that is so.

Always for the same reason: because it invokes functions that can
might sleep (and are in fact documented as such).  Don't be fooled by
the clever trick of bypassing the "might_sleep" warning using that
"...._inatomic" lower level call.

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10 21:46                     ` Frank Ch. Eigler
@ 2006-02-10 21:55                       ` Martin Hunt
  2006-02-10 22:12                         ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-10 21:55 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, 2006-02-10 at 16:45 -0500, Frank Ch. Eigler wrote:
> hunt wrote:
> 
> > [...]
> > function copy_more:long(addr:long)
> > %{
> >       THIS->__retvalue = _stp_copy_from_user (buf, 
> > 	(char __user *)(long)THIS->addr, 2048);
> > %}
> > [...]
> 
> As mentioned several times, this runtime routine "_stp_copy_from_user"
> is not safe, and should be rewritten or removed.

Yes, you always say this. Please tell us why that is so.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-10  5:39                   ` Martin Hunt
@ 2006-02-10 21:46                     ` Frank Ch. Eigler
  2006-02-10 21:55                       ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Ch. Eigler @ 2006-02-10 21:46 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap


hunt wrote:

> [...]
> function copy_more:long(addr:long)
> %{
>       THIS->__retvalue = _stp_copy_from_user (buf, 
> 	(char __user *)(long)THIS->addr, 2048);
> %}
> [...]

As mentioned several times, this runtime routine "_stp_copy_from_user"
is not safe, and should be rewritten or removed.

This is a separate matter from the kprobes fault handler doing the
right thing.  Once it works again the way it did in dprobes, an access
fault will most likely be made to cause a probe handler abort, not
just a polite local -EFAULT result code.

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-09 22:06                 ` Martin Hunt
@ 2006-02-10  5:39                   ` Martin Hunt
  2006-02-10 21:46                     ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-10  5:39 UTC (permalink / raw)
  To: Jim Keniston; +Cc: prasanna, Richard J Moore, suparna, systemtap

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

Maybe I should post one of my test programs. The attached one crashes on
my i686 system running an SMP kernel in a couple of seconds. On non-SMP
kernels, it prints a bunch of junk to syslog but doesn't seem to crash,
although I haven't tested it much.  With my simple fixup_exception patch
to the handler, it runs perfectly. I've modified it to keep looping
through the address space and run it for hours.



[-- Attachment #2: mem_test.stp --]
[-- Type: text/plain, Size: 619 bytes --]

# Memory test. Tries userspace copies covering
# the entire 32-bit address space.

global addr

%{
	static char buf[16384];
%}

function copy_more:long(addr:long)
%{
      THIS->__retvalue = _stp_copy_from_user (buf, 
	(char __user *)(long)THIS->addr, 2048);
%}

probe kernel.function("sys_read") {
	printf("copying from %x ... ", addr)
	ret = copy_more(addr)
	if (ret < 2048)
		printf("FAILED (%d)\n", ret)
	else
		printf("GOOD\n")
	addr += 4096
	if (addr > 0xffffffff)
		exit()
}

probe end {
	printf("\nDONE. Test succeeded if you see this and there\n")
	printf("are no might_sleep warnings in the system log.\n")
}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-09 21:35               ` Jim Keniston
@ 2006-02-09 22:06                 ` Martin Hunt
  2006-02-10  5:39                   ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-09 22:06 UTC (permalink / raw)
  To: Jim Keniston; +Cc: prasanna, Richard J Moore, suparna, systemtap

On Thu, 2006-02-09 at 13:35 -0800, Jim Keniston wrote:

> > +		/*
> > +		 * In case the user-specified fault handler returned zero,
> > +		 * try to fix up.
> > +		 */
> > +
> > +		if (fixup_exception(regs))
> > +			return 1;
> 
> I think it's OK to call fixup_exceptions() here, but I believe it's
> redundant.  I understood Suparna to say
> (http://sourceware.org/ml/systemtap/2006-q1/msg00423.html) that if we
> return 0, do_page_fault() will call fixup_exceptions() instead of trying
> to bring in the missing page (since it's a kernel instruction -- in a
> handler -- that faulted).  Her explanation made sense to me.

But experimentally things don't work the way they should.
I see lots of these

Feb  2 23:15:53 monkey2 kernel: Debug: sleeping function called from
invalid context at mm/page_alloc.c:618
Feb  2 23:15:53 monkey2 kernel: in_atomic():0[expected: 0],
irqs_disabled():1
Feb  2 23:15:53 monkey2 kernel:  [<c011df50>] __might_sleep+0x7d/0x89
Feb  2 23:15:53 monkey2 kernel:  [<c014b802>] __alloc_pages+0x3a/0x2f7
Feb  2 23:15:53 monkey2 kernel:  [<c0157a48>] do_no_page+0x55/0x3bf
Feb  2 23:15:53 monkey2 kernel:  [<c011a19e>] pte_alloc_one+0x18/0x49
Feb  2 23:15:53 monkey2 kernel:  [<c015553d>] pte_alloc_map+0x66/0x12d
Feb  2 23:15:53 monkey2 kernel:  [<c0157f6d>] handle_mm_fault+0xb0/0x1fd
Feb  2 23:15:53 monkey2 kernel:  [<c011a8ed>] do_page_fault+0x1ac/0x4dc
Feb  2 23:15:53 monkey2 kernel:  [<c02ab2c1>] sock_aio_write+0x106/0x113
Feb  2 23:15:53 monkey2 kernel:  [<c0119263>] kprobe_exceptions_notify
+0xc6/0x123
Feb  2 23:15:53 monkey2 kernel:  [<c011a741>] do_page_fault+0x0/0x4dc
Feb  2 23:15:53 monkey2 kernel:  [<c030fa4f>] error_code+0x2f/0x38
Feb  2 23:15:53 monkey2 kernel:  [<c01e6028>] __copy_from_user_ll
+0x30/0x48
Feb  2 23:15:53 monkey2 kernel:  [<e0b9dac7>] _stp_copy_from_user
+0x2d/0x4f [copy]
Feb  2 23:15:53 monkey2 kernel:  [<c0168211>] sys_read+0x0/0x62
Feb  2 23:15:53 monkey2 kernel:  [<e0b9dc40>] inst_sys_read+0x15/0x45
[copy]
Feb  2 23:15:53 monkey2 kernel:  [<c0119020>] kprobe_handler+0x1f0/0x230
Feb  2 23:15:53 monkey2 kernel:  [<c01191ce>] kprobe_exceptions_notify
+0x31/0x123
Feb  2 23:15:53 monkey2 kernel:  [<c0130c59>] notifier_call_chain
+0x17/0x2e
Feb  2 23:15:53 monkey2 kernel:  [<c01076f7>] do_int3+0x3d/0xcf
Feb  2 23:15:53 monkey2 kernel:  [<c0143260>] audit_syscall_entry
+0x124/0x13d
Feb  2 23:15:53 monkey2 kernel:  [<c030fbaf>] int3+0x1f/0x30
Feb  2 23:15:53 monkey2 kernel:  [<c0168212>] sys_read+0x1/0x62
Feb  2 23:15:53 monkey2 kernel:  [<c030f8cb>] syscall_call+0x7/0xb
Feb  2 23:15:53 monkey2 kernel:  [<c030007b>] xfrm_policy_gc_kill
+0x39/0x68

The above only happens on non-smp machines.  On SMP, I usually get
crashes.  Putting the fixup_exception() call in got rid of the messages
and crashes for me.

That's as far as I have investigated. 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-09  7:23             ` Prasanna S Panchamukhi
  2006-02-09 16:33               ` Keshavamurthy Anil S
@ 2006-02-09 21:35               ` Jim Keniston
  2006-02-09 22:06                 ` Martin Hunt
  1 sibling, 1 reply; 33+ messages in thread
From: Jim Keniston @ 2006-02-09 21:35 UTC (permalink / raw)
  To: prasanna; +Cc: Richard J Moore, suparna, Martin Hunt, SystemTAP

On Wed, 2006-02-08 at 23:23, Prasanna S Panchamukhi wrote:
> Hi,
> 
> Please find the patch below to provide robust kprobe fault handling.
> Presently this patch is for i386 architecture, will port to other
> architectures if this approach looks ok. This patch is currently
> untested, appreciate any feedback.
...
> -	if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
> -		return 1;
> -

I think it's right to move the above call, but if you do, your patch
should update Documentation/kprobes.txt, which states:

"If a fault
occurs during execution of kp->pre_handler or kp->post_handler,
>>> or during single-stepping of the probed instruction, <<<
Kprobes calls kp->fault_handler."

>  	if (kcb->kprobe_status & KPROBE_HIT_SS) {
> +		/*
> +		 * We are here because the instruction being single stepped
> +		 * caused a page fault. We reset the current kprobe and the
> +		 * eip points back to the probe address and allow the page
> +		 * fault handler.
> +		 */
>  		resume_execution(cur, regs, kcb);

I agree with Anil that resume_execution() can make adjustments we don't
want (since the probed instruction hasn't completed).  Seems like what
we want is just
	regs->eip = (unsigned long) cur->addr;

>  		regs->eflags |= kcb->kprobe_old_eflags;
>  
>  		reset_current_kprobe();
>  		preempt_enable_no_resched();

Bibo said:
> When single stepped instruction cause page fault, eip will point to
> kprobe copied address, but not probed address, so execution table
> search will fail.

I don't think finding the fixup will be a problem, because we restore
regs->eip to the address of the actual probed instruction before
returning to do_page_fault().

> +	} else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
> +			(kcb-kprobe_status & KPROBE_HIT_ACTIVE)) {
> +		/*
> +		 * We come here because instructions in the pre/post handler
> +		 * caused the page_fault, this could happen if handler tries
> +		 * to access user space by copy_from_user(), get_user() etc.
> +		 * Let the user-specified handler try to fix it first.
> +		 */
> +		if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
> +			return 1;
> +
> +		/*
> +		 * In case the user-specified fault handler returned zero,
> +		 * try to fix up.
> +		 */
> +
> +		if (fixup_exception(regs))
> +			return 1;

I think it's OK to call fixup_exceptions() here, but I believe it's
redundant.  I understood Suparna to say
(http://sourceware.org/ml/systemtap/2006-q1/msg00423.html) that if we
return 0, do_page_fault() will call fixup_exceptions() instead of trying
to bring in the missing page (since it's a kernel instruction -- in a
handler -- that faulted).  Her explanation made sense to me.

We still don't handle the case of an erroneous pre/post-handler that
incurs a fault with no fixup.  (We'd like to do something other than let
the kernel crash.)  I think we'd have to do a sort of setjmp before
calling the handler in order to do that.

Disclaimer: Some or all of this is probably wrong (again). :-}

Jim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-09  7:23             ` Prasanna S Panchamukhi
@ 2006-02-09 16:33               ` Keshavamurthy Anil S
  2006-02-09 21:35               ` Jim Keniston
  1 sibling, 0 replies; 33+ messages in thread
From: Keshavamurthy Anil S @ 2006-02-09 16:33 UTC (permalink / raw)
  To: Prasanna S Panchamukhi
  Cc: Richard J Moore, suparna, Martin Hunt, Jim Keniston, SystemTAP

> 
>     arch/i386/kernel/kprobes.c |   32 +++++++++++++++++++++++++++++---
>     1 files changed, 29 insertions(+), 3 deletions(-)
> 
>            if (kcb->kprobe_status & KPROBE_HIT_SS) {
>    +               /*
>    +                  *  We are here because the instruction being single
>    stepped
>    +                  *  caused a page fault. We reset the current kprobe
>    and the
>    +                 * eip points back to the probe address and allow the
>    page
>    +                * fault handler.
>    +                */
>                    resume_execution(cur, regs, kcb);
resume_execution() tries to fixup the relative IP address and/or
tries to fixup branch address as if we were successfull in single stepping.
I think we just need to point eip back to probed address here.
Also as Bibo pointed out, not sure how do_page_faulut() function can fix up the 
page   we are trying to single step as the current eip in the regs points to 
probed address.

> 
>                    reset_current_kprobe();
Need to handle KPROBE_REENTER case here, i.e 
if(kcb->kprobe_status & KPROBE_REENTER) {
	restore_previous_kprobe();
} else {
	reset_current_kprobe();
}
>                    preempt_enable_no_resched();

-anil

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: kprobe fault handling
  2006-02-09  8:55 Mao, Bibo
@ 2006-02-09 10:22 ` Richard J Moore
  0 siblings, 0 replies; 33+ messages in thread
From: Richard J Moore @ 2006-02-09 10:22 UTC (permalink / raw)
  To: SystemTAP





That needs to be fixed by adjusting the trap stack frame before we exit to
the system.

But I have another two questions:

1) how will the post-handler discover whether the single-stepped
instruction generated an exception? It needs to know this for tracing
purposes.

2) what happens when a fault is caused directly by either the pre- or
post-handlers and not within copy_from_user() in the case that we indicate
that the fault has been handled? Does the pre- or post- handler regain
control? If so how do we avoid re-executing the same faulting instruction?
If not I don't think this implementation is particularly useful as we don't
have a mechanism to carry on with normal pre- or post-handler processing.



- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072


                                                                           
             "Mao, Bibo"                                                   
             <bibo.mao@intel                                               
             .com>                                                      To 
             Sent by:                  <prasanna@in.ibm.com>, Richard J    
             systemtap-owner           Moore/UK/IBM@IBMGB                  
             @sourceware.org                                            cc 
                                       <suparna@in.ibm.com>, "Martin Hunt" 
                                       <hunt@redhat.com>, "Jim Keniston"   
             09/02/2006                <jkenisto@us.ibm.com>, "SystemTAP"  
             08:54                     <systemtap@sources.redhat.com>      
                                                                       bcc 
                                                                           
                                                                   Subject 
                                       RE: kprobe fault handling           
                                                                           
                                                                           




When instruction being single stepped causes page fault, resetting current
kprobe will cause pre_handle function to execute twice. And I think main
aim of set eip point to probed ip is for fixup_exception(regs) in
do_page_fault function. If probed instruction try to access memory not in
VMA table, it will call fixup_exception(regs) to fix it, it will search
exception table and set eip to fixup instruction address.
When single stepped instruction cause page fault, eip will point to kprobe
copied address, but not probed address, so execution table search will
fail.
And I do not whether there is good way to avoid kprobe execution twice and
jump to fixup code when single stepped instruction cause page pault.

bibo,mao

>-----Original Message-----
>From: systemtap-owner@sourceware.org
[mailto:systemtap-owner@sourceware.org]
>On Behalf Of Prasanna S Panchamukhi
>Sent: 2006年2月9日 15:24
>To: Richard J Moore
>Cc: suparna@in.ibm.com; Martin Hunt; Jim Keniston; SystemTAP
>Subject: Re: kprobe fault handling
>
>Hi,
>
>Please find the patch below to provide robust kprobe fault handling.
>Presently this patch is for i386 architecture, will port to other
>architectures if this approach looks ok. This patch is currently
>untested, appreciate any feedback.
>
>Thanks
>Prasanna
>
>This patch provides proper kprobes fault handling, if a user specified
pre/post
>handlers tires to access user address space, through copy_from_user(),
>get_user() etc. The user-specified handler gets called only if the fault
>occurs due to kprobes handlers and if the user-specified handler does
>not fix it, we try to fix it by calling fix_exception().
>The user specified handler will not be called if the fault happens when
>single stepping the original instruction.
>
>Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>
>
> arch/i386/kernel/kprobes.c |   32 +++++++++++++++++++++++++++++---
> 1 files changed, 29 insertions(+), 3 deletions(-)
>
>diff -puN arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix
>arch/i386/kernel/kprobes.c
>---
>linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix
>            2006-02-09 12:50:44.000000000 +0530
>+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c
2006-02-09
>12:50:44.000000000 +0530
>@@ -543,15 +543,41 @@ static inline int kprobe_fault_handler(s
>            struct kprobe *cur = kprobe_running();
>            struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>
>-           if (cur->fault_handler && cur->fault_handler(cur, regs,
trapnr))
>-                       return 1;
>-
>            if (kcb->kprobe_status & KPROBE_HIT_SS) {
>+                       /*
>+                        * We are here because the instruction being
single stepped
>+                        * caused a page fault. We reset the current
kprobe and the
>+                        * eip points back to the probe address and allow
the page
>+                        * fault handler.
>+                        */
>                        resume_execution(cur, regs, kcb);
>                        regs->eflags |= kcb->kprobe_old_eflags;
>
>                        reset_current_kprobe();
>                        preempt_enable_no_resched();
>+           } else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
>+                                   (kcb-kprobe_status &
KPROBE_HIT_ACTIVE)) {
>+                       /*
>+                        * We come here because instructions in the
pre/post handler
>+                        * caused the page_fault, this could happen if
handler tries
>+                        * to access user space by copy_from_user(),
get_user() etc.
>+                        * Let the user-specified handler try to fix it
first.
>+                        */
>+                       if (cur->fault_handler && cur->fault_handler(cur,
regs, trapnr))
>+                                   return 1;
>+
>+                       /*
>+                        * In case the user-specified fault handler
returned zero,
>+                        * try to fix up.
>+                        */
>+
>+                       if (fixup_exception(regs))
>+                                   return 1;
>+
>+                       /*
>+                        * fixup_exception() could not handle it,
>+                        * Let do_page_fault() fix it.
>+                        */
>            }
>            return 0;
> }
>
>_
>
>On Wed, Feb 08, 2006 at 11:27:53AM +0000, Richard J Moore wrote:
>>
>>
>>
>>
>>
>> systemtap-owner@sourceware.org wrote on 08/02/2006 04:38:31:
>>
>> >
>> > Recalling the way this was supposed to work ... AFAICR
>> >
>> > There are 2 main situations where one could land up into
>> > the page fault handler during kprobe processing:
>> >
>> > (1) We have completed the probe handler and are single-stepping the
>> probed
>> >     instruction. The original instruction being single-stepped causes
a
>> >     page fault. This would typically happen if that instruction issues
a
>> >     memory access that is paged out.
>> >
>>
>> Yes you are correct, the code does do this, but its not supposed to. The
>> pagefault handler was intended solely to protect the system from the
>> probe-handler, not to catch faults in the original instruction.
>>
>> I think what's happened is that two needs have become confused.  Going
back
>> to the original dprobes implementation:
>>
>> The pagefault callback was there to allow the probe-handler to continue
>> even if it touched inaccessible memory.
>> The dprobes interpreter always hooked faults and protected the system
from
>> the probe handler.
>> On single-stepping the original instruction any exception was percolated
up
>> to the systems, however the log buffer would be discarded by the dprobes
>> event handler unless logonfault had been specified. This was implemented
to
>> support tracing and to suppress multiple trace entries for tracepoint
that
>> would temporarily fault, then subsequently retry execution successfully.
>>
>> When we removed the dprobes interpreter to a device drive and created a
>> minimal kernel API set (krpobes) to support the device driver, we needed
to
>> support this capability through the kprobes APIs. So what appears to
have
>> happened is that the single-step fault and the handler fault have become
>> handler through the same mechanism. So the question is: is this
>> functionally correct?
>>
>> They reason why I doubt that it is correct is that any one using the
>> kprobes APIs is going to need a very thorough understanding of the
>> motivation behind this interface to use it correctly. I would propose
>> separating out the notification of single-stepping (original instruction
>> execution) exceptions from self-generated probe-handder exceptions.
>>
>> These are some possible options:
>>
>> 1)
>> leave the page-fault handler to be a called routine that's invoked
instead
>> of the post-handler when single-step generates an exception.
>> create a setjump, longjump mechanism to allow the probe handler to catch
>> any self-generated exception but always protect the rest of the system
from
>> any exception that is not intercepted.
>>
>> 2)
>>
>> leave the page-fault handler as it - a called routine, but don't invoke
it
>> for single-step exceptions.
>> allow the page-fault handler to deal with creating or simulating a
>> setjump/longjump back to the original pre or post handler code.
>> pass a switch to the post-handler to indicate whether the
single-stepping
>> resulted in a exception.
>>
>> 3) a combination of 1 & 2, (which I think I prefer)
>> dispense with the page-fault handler routine.
>> create a setjump/long mechanism for the pre- and post-handler routines
to
>> use if they wish.
>> always shield any unhandled exception that has been generated by the
pre-
>> and post- handlers from the rest of the system
>> indicate via a switch to the post-handler whether the singe-stepping
>> resulted in an exception (other than a debug exception of course :-))
>>
>>
>> Finally in response to how are things broken:
>>
>> Well I've just conducted an experiment on kprobes where I deliberately
>> generate a pagefault in the pre-handler and it appears that my pae-fault
>> handler is never called, be we do seem to be protecting the system from
may
>> pagefault exception.
>>
>>
>> > (2) We are executing the probe handler, and an instruction in the
probe
>> >     handler causes a page fault. This could, for example, happen if
the
>> >     probe handler attempts to access a user-space address by issuing
>> >     copy_from_user or get_user etc, and that address is currently
>> >     paged out or invalid.
>> >
>> > The check KPROBE_HIT_SS enables us to distinguish between these two
>> cases.
>> >
>> > If set, it indicates that (1) must have occured. In which case, we
want
>> > normal page fault handling to continue to that the data accessed by
the
>> > instruction is paged in, and instruction execution continue after
>> > fault handling as would happen in normal execution. However, since
>> > this can cause a sleep, other probe hits etc, we turn off kprobes
>> > related state (hence the resume execution) before letting it continue.
>> Once
>> > the data is paged in, the page fault handler would resume execution
from
>> > the original probed address and thus trap again on the breakpoint
>> instruction.
>> > So we encounter the probe handler a second time, and this time
>> single-stepping
>> > would succeed since the address accessed by the instruction is already
>> > in memory, and finally we would get a single-step trap and the post
>> handler
>> > would be executed.
>> >
>> > In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or
at
>> > least were) running without interrupts, the default behaviour was to
>> > soft-fail the access, i.e. cause the copy_from_user/get_user call in
>> > the probe handler to fail with -EFAULT. How does this happen ? The
>> > in_atomic() check or its equivalent in do_page_fault succeeds and
hence
>> > control jumps to bad_area_nosemaphore where it finds that the access
>> > happened in kernel mode and hence moves to no_context, where
>> > fixup_exception() takes care of the rest.
>> > More sophisticated or customised behaviour could be achieved by
providing
>> > a ->fault_handler(), we could play some neat tricks (like Richard
>> mentioned)
>> > perform fixups etc.
>> >
>> > I hope that makes it a little clearer.
>> >
>> > Could you now help me understand what is broken now, if at all ?
>> >
>> > Regards
>> > Suparna
>> >
>> > On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote:
>> > > On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
>> > > > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
>> > > > [...]
>> > > > > > > And that's it. kprobe_fault_handler returns 0.  No call to
>> > > > > > > fixup_exceptions()!  So do_page_fault() will have to do
>> > the fixups, but
>> > > > > > > first it will print nasty might_sleep warnings and maybe
>> > actually sleep!
>> > > > >
>> > > > > You have the right idea of running fixup_exceptions() in the
fault
>> > > > > handler, to prevent do_page_fault() from attempting its
>> demand-paging
>> > > > > stuff.  I just think it's OK to have the user's fault handler,
>> rather
>> > > > > than kprobe_fault_handler, run fixup_exceptions.
>> > > >
>> > > > I think if the user's fault handler does not handle the page
fault,
>> > > > kprobes should attempt to do so. Returning without handling the
page
>> > > > fault could leave the system in a bad state. So your proposal
would
>> mean
>> > > > it is necessary to specify a fault handler for any kprobe that
might
>> > > > attempt user-space access or otherwise trigger a page fault,
>> otherwise
>> > > > the system could crash.
>> > >
>> > > I just had a long chat with Richard Moore about this whole topic.  I
>> > > agree with you on this, and I think Richard would, too.
>> > >
>> > > So unless there's a user-specified handler and that handler
specifies
>> > > (by returning 1) that it has handled the exception,
>> > > kprobe_fault_handler() should run fixup_exception(), right?
>> > >
>> > > Now I'm looking, later in that function, at the code (on i386) where
we
>> > > handle an exception while single-stepping.  I don't think
>> > > resume_execution() is the right thing to do here.  We haven't
>> > > successfully executed the probed instruction, and the eip still
points
>> > > at that instruction, right?  I think we're just hosed at this point.
>> > > Comments?
>> > >
>> > > >
>> > > > Any more importantly, fixup_exception() and all the exception
table
>> code
>> > > > are currently not exported so cannot be called from a module.
>> > >
>> > > It's hard to argue with that. :-)
>> > >
>> > > >
>> > > > Martin
>> > >
>> > > Jim
>> > >
>> > >
>>
>
>--
>Prasanna S Panchamukhi
>Linux Technology Center
>India Software Labs, IBM Bangalore
>Email: prasanna@in.ibm.com
>Ph: 91-80-51776329

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: kprobe fault handling
@ 2006-02-09  8:55 Mao, Bibo
  2006-02-09 10:22 ` Richard J Moore
  0 siblings, 1 reply; 33+ messages in thread
From: Mao, Bibo @ 2006-02-09  8:55 UTC (permalink / raw)
  To: prasanna, Richard J Moore; +Cc: suparna, Martin Hunt, Jim Keniston, SystemTAP

When instruction being single stepped causes page fault, resetting current kprobe will cause pre_handle function to execute twice. And I think main aim of set eip point to probed ip is for fixup_exception(regs) in do_page_fault function. If probed instruction try to access memory not in VMA table, it will call fixup_exception(regs) to fix it, it will search exception table and set eip to fixup instruction address.
When single stepped instruction cause page fault, eip will point to kprobe copied address, but not probed address, so execution table search will fail.
And I do not whether there is good way to avoid kprobe execution twice and jump to fixup code when single stepped instruction cause page pault.

bibo,mao

>-----Original Message-----
>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org]
>On Behalf Of Prasanna S Panchamukhi
>Sent: 2006年2月9日 15:24
>To: Richard J Moore
>Cc: suparna@in.ibm.com; Martin Hunt; Jim Keniston; SystemTAP
>Subject: Re: kprobe fault handling
>
>Hi,
>
>Please find the patch below to provide robust kprobe fault handling.
>Presently this patch is for i386 architecture, will port to other
>architectures if this approach looks ok. This patch is currently
>untested, appreciate any feedback.
>
>Thanks
>Prasanna
>
>This patch provides proper kprobes fault handling, if a user specified pre/post
>handlers tires to access user address space, through copy_from_user(),
>get_user() etc. The user-specified handler gets called only if the fault
>occurs due to kprobes handlers and if the user-specified handler does
>not fix it, we try to fix it by calling fix_exception().
>The user specified handler will not be called if the fault happens when
>single stepping the original instruction.
>
>Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>
>
> arch/i386/kernel/kprobes.c |   32 +++++++++++++++++++++++++++++---
> 1 files changed, 29 insertions(+), 3 deletions(-)
>
>diff -puN arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix
>arch/i386/kernel/kprobes.c
>---
>linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix
>	2006-02-09 12:50:44.000000000 +0530
>+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c	2006-02-09
>12:50:44.000000000 +0530
>@@ -543,15 +543,41 @@ static inline int kprobe_fault_handler(s
> 	struct kprobe *cur = kprobe_running();
> 	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>
>-	if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
>-		return 1;
>-
> 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
>+		/*
>+		 * We are here because the instruction being single stepped
>+		 * caused a page fault. We reset the current kprobe and the
>+		 * eip points back to the probe address and allow the page
>+		 * fault handler.
>+		 */
> 		resume_execution(cur, regs, kcb);
> 		regs->eflags |= kcb->kprobe_old_eflags;
>
> 		reset_current_kprobe();
> 		preempt_enable_no_resched();
>+	} else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
>+			(kcb-kprobe_status & KPROBE_HIT_ACTIVE)) {
>+		/*
>+		 * We come here because instructions in the pre/post handler
>+		 * caused the page_fault, this could happen if handler tries
>+		 * to access user space by copy_from_user(), get_user() etc.
>+		 * Let the user-specified handler try to fix it first.
>+		 */
>+		if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
>+			return 1;
>+
>+		/*
>+		 * In case the user-specified fault handler returned zero,
>+		 * try to fix up.
>+		 */
>+
>+		if (fixup_exception(regs))
>+			return 1;
>+
>+		/*
>+		 * fixup_exception() could not handle it,
>+		 * Let do_page_fault() fix it.
>+		 */
> 	}
> 	return 0;
> }
>
>_
>
>On Wed, Feb 08, 2006 at 11:27:53AM +0000, Richard J Moore wrote:
>>
>>
>>
>>
>>
>> systemtap-owner@sourceware.org wrote on 08/02/2006 04:38:31:
>>
>> >
>> > Recalling the way this was supposed to work ... AFAICR
>> >
>> > There are 2 main situations where one could land up into
>> > the page fault handler during kprobe processing:
>> >
>> > (1) We have completed the probe handler and are single-stepping the
>> probed
>> >     instruction. The original instruction being single-stepped causes a
>> >     page fault. This would typically happen if that instruction issues a
>> >     memory access that is paged out.
>> >
>>
>> Yes you are correct, the code does do this, but its not supposed to. The
>> pagefault handler was intended solely to protect the system from the
>> probe-handler, not to catch faults in the original instruction.
>>
>> I think what's happened is that two needs have become confused.  Going back
>> to the original dprobes implementation:
>>
>> The pagefault callback was there to allow the probe-handler to continue
>> even if it touched inaccessible memory.
>> The dprobes interpreter always hooked faults and protected the system from
>> the probe handler.
>> On single-stepping the original instruction any exception was percolated up
>> to the systems, however the log buffer would be discarded by the dprobes
>> event handler unless logonfault had been specified. This was implemented to
>> support tracing and to suppress multiple trace entries for tracepoint that
>> would temporarily fault, then subsequently retry execution successfully.
>>
>> When we removed the dprobes interpreter to a device drive and created a
>> minimal kernel API set (krpobes) to support the device driver, we needed to
>> support this capability through the kprobes APIs. So what appears to have
>> happened is that the single-step fault and the handler fault have become
>> handler through the same mechanism. So the question is: is this
>> functionally correct?
>>
>> They reason why I doubt that it is correct is that any one using the
>> kprobes APIs is going to need a very thorough understanding of the
>> motivation behind this interface to use it correctly. I would propose
>> separating out the notification of single-stepping (original instruction
>> execution) exceptions from self-generated probe-handder exceptions.
>>
>> These are some possible options:
>>
>> 1)
>> leave the page-fault handler to be a called routine that's invoked instead
>> of the post-handler when single-step generates an exception.
>> create a setjump, longjump mechanism to allow the probe handler to catch
>> any self-generated exception but always protect the rest of the system from
>> any exception that is not intercepted.
>>
>> 2)
>>
>> leave the page-fault handler as it - a called routine, but don't invoke it
>> for single-step exceptions.
>> allow the page-fault handler to deal with creating or simulating a
>> setjump/longjump back to the original pre or post handler code.
>> pass a switch to the post-handler to indicate whether the single-stepping
>> resulted in a exception.
>>
>> 3) a combination of 1 & 2, (which I think I prefer)
>> dispense with the page-fault handler routine.
>> create a setjump/long mechanism for the pre- and post-handler routines to
>> use if they wish.
>> always shield any unhandled exception that has been generated by the pre-
>> and post- handlers from the rest of the system
>> indicate via a switch to the post-handler whether the singe-stepping
>> resulted in an exception (other than a debug exception of course :-))
>>
>>
>> Finally in response to how are things broken:
>>
>> Well I've just conducted an experiment on kprobes where I deliberately
>> generate a pagefault in the pre-handler and it appears that my pae-fault
>> handler is never called, be we do seem to be protecting the system from may
>> pagefault exception.
>>
>>
>> > (2) We are executing the probe handler, and an instruction in the probe
>> >     handler causes a page fault. This could, for example, happen if the
>> >     probe handler attempts to access a user-space address by issuing
>> >     copy_from_user or get_user etc, and that address is currently
>> >     paged out or invalid.
>> >
>> > The check KPROBE_HIT_SS enables us to distinguish between these two
>> cases.
>> >
>> > If set, it indicates that (1) must have occured. In which case, we want
>> > normal page fault handling to continue to that the data accessed by the
>> > instruction is paged in, and instruction execution continue after
>> > fault handling as would happen in normal execution. However, since
>> > this can cause a sleep, other probe hits etc, we turn off kprobes
>> > related state (hence the resume execution) before letting it continue.
>> Once
>> > the data is paged in, the page fault handler would resume execution from
>> > the original probed address and thus trap again on the breakpoint
>> instruction.
>> > So we encounter the probe handler a second time, and this time
>> single-stepping
>> > would succeed since the address accessed by the instruction is already
>> > in memory, and finally we would get a single-step trap and the post
>> handler
>> > would be executed.
>> >
>> > In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or at
>> > least were) running without interrupts, the default behaviour was to
>> > soft-fail the access, i.e. cause the copy_from_user/get_user call in
>> > the probe handler to fail with -EFAULT. How does this happen ? The
>> > in_atomic() check or its equivalent in do_page_fault succeeds and hence
>> > control jumps to bad_area_nosemaphore where it finds that the access
>> > happened in kernel mode and hence moves to no_context, where
>> > fixup_exception() takes care of the rest.
>> > More sophisticated or customised behaviour could be achieved by providing
>> > a ->fault_handler(), we could play some neat tricks (like Richard
>> mentioned)
>> > perform fixups etc.
>> >
>> > I hope that makes it a little clearer.
>> >
>> > Could you now help me understand what is broken now, if at all ?
>> >
>> > Regards
>> > Suparna
>> >
>> > On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote:
>> > > On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
>> > > > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
>> > > > [...]
>> > > > > > > And that's it. kprobe_fault_handler returns 0.  No call to
>> > > > > > > fixup_exceptions()!  So do_page_fault() will have to do
>> > the fixups, but
>> > > > > > > first it will print nasty might_sleep warnings and maybe
>> > actually sleep!
>> > > > >
>> > > > > You have the right idea of running fixup_exceptions() in the fault
>> > > > > handler, to prevent do_page_fault() from attempting its
>> demand-paging
>> > > > > stuff.  I just think it's OK to have the user's fault handler,
>> rather
>> > > > > than kprobe_fault_handler, run fixup_exceptions.
>> > > >
>> > > > I think if the user's fault handler does not handle the page fault,
>> > > > kprobes should attempt to do so. Returning without handling the page
>> > > > fault could leave the system in a bad state. So your proposal would
>> mean
>> > > > it is necessary to specify a fault handler for any kprobe that might
>> > > > attempt user-space access or otherwise trigger a page fault,
>> otherwise
>> > > > the system could crash.
>> > >
>> > > I just had a long chat with Richard Moore about this whole topic.  I
>> > > agree with you on this, and I think Richard would, too.
>> > >
>> > > So unless there's a user-specified handler and that handler specifies
>> > > (by returning 1) that it has handled the exception,
>> > > kprobe_fault_handler() should run fixup_exception(), right?
>> > >
>> > > Now I'm looking, later in that function, at the code (on i386) where we
>> > > handle an exception while single-stepping.  I don't think
>> > > resume_execution() is the right thing to do here.  We haven't
>> > > successfully executed the probed instruction, and the eip still points
>> > > at that instruction, right?  I think we're just hosed at this point.
>> > > Comments?
>> > >
>> > > >
>> > > > Any more importantly, fixup_exception() and all the exception table
>> code
>> > > > are currently not exported so cannot be called from a module.
>> > >
>> > > It's hard to argue with that. :-)
>> > >
>> > > >
>> > > > Martin
>> > >
>> > > Jim
>> > >
>> > >
>>
>
>--
>Prasanna S Panchamukhi
>Linux Technology Center
>India Software Labs, IBM Bangalore
>Email: prasanna@in.ibm.com
>Ph: 91-80-51776329

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-08 11:32           ` Richard J Moore
@ 2006-02-09  7:23             ` Prasanna S Panchamukhi
  2006-02-09 16:33               ` Keshavamurthy Anil S
  2006-02-09 21:35               ` Jim Keniston
  0 siblings, 2 replies; 33+ messages in thread
From: Prasanna S Panchamukhi @ 2006-02-09  7:23 UTC (permalink / raw)
  To: Richard J Moore; +Cc: suparna, Martin Hunt, Jim Keniston, SystemTAP

Hi,

Please find the patch below to provide robust kprobe fault handling.
Presently this patch is for i386 architecture, will port to other
architectures if this approach looks ok. This patch is currently
untested, appreciate any feedback.

Thanks
Prasanna

This patch provides proper kprobes fault handling, if a user specified pre/post
handlers tires to access user address space, through copy_from_user(),
get_user() etc. The user-specified handler gets called only if the fault
occurs due to kprobes handlers and if the user-specified handler does
not fix it, we try to fix it by calling fix_exception().
The user specified handler will not be called if the fault happens when
single stepping the original instruction.

Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>


 arch/i386/kernel/kprobes.c |   32 +++++++++++++++++++++++++++++---
 1 files changed, 29 insertions(+), 3 deletions(-)

diff -puN arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix arch/i386/kernel/kprobes.c
--- linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix	2006-02-09 12:50:44.000000000 +0530
+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c	2006-02-09 12:50:44.000000000 +0530
@@ -543,15 +543,41 @@ static inline int kprobe_fault_handler(s
 	struct kprobe *cur = kprobe_running();
 	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
 
-	if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
-		return 1;
-
 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
+		/*
+		 * We are here because the instruction being single stepped
+		 * caused a page fault. We reset the current kprobe and the
+		 * eip points back to the probe address and allow the page
+		 * fault handler.
+		 */
 		resume_execution(cur, regs, kcb);
 		regs->eflags |= kcb->kprobe_old_eflags;
 
 		reset_current_kprobe();
 		preempt_enable_no_resched();
+	} else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
+			(kcb-kprobe_status & KPROBE_HIT_ACTIVE)) {
+		/*
+		 * We come here because instructions in the pre/post handler
+		 * caused the page_fault, this could happen if handler tries
+		 * to access user space by copy_from_user(), get_user() etc.
+		 * Let the user-specified handler try to fix it first.
+		 */
+		if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
+			return 1;
+
+		/*
+		 * In case the user-specified fault handler returned zero,
+		 * try to fix up.
+		 */
+
+		if (fixup_exception(regs))
+			return 1;
+
+		/*
+		 * fixup_exception() could not handle it,
+		 * Let do_page_fault() fix it.
+		 */
 	}
 	return 0;
 }

_

On Wed, Feb 08, 2006 at 11:27:53AM +0000, Richard J Moore wrote:
> 
> 
> 
> 
> 
> systemtap-owner@sourceware.org wrote on 08/02/2006 04:38:31:
> 
> >
> > Recalling the way this was supposed to work ... AFAICR
> >
> > There are 2 main situations where one could land up into
> > the page fault handler during kprobe processing:
> >
> > (1) We have completed the probe handler and are single-stepping the
> probed
> >     instruction. The original instruction being single-stepped causes a
> >     page fault. This would typically happen if that instruction issues a
> >     memory access that is paged out.
> >
> 
> Yes you are correct, the code does do this, but its not supposed to. The
> pagefault handler was intended solely to protect the system from the
> probe-handler, not to catch faults in the original instruction.
> 
> I think what's happened is that two needs have become confused.  Going back
> to the original dprobes implementation:
> 
> The pagefault callback was there to allow the probe-handler to continue
> even if it touched inaccessible memory.
> The dprobes interpreter always hooked faults and protected the system from
> the probe handler.
> On single-stepping the original instruction any exception was percolated up
> to the systems, however the log buffer would be discarded by the dprobes
> event handler unless logonfault had been specified. This was implemented to
> support tracing and to suppress multiple trace entries for tracepoint that
> would temporarily fault, then subsequently retry execution successfully.
> 
> When we removed the dprobes interpreter to a device drive and created a
> minimal kernel API set (krpobes) to support the device driver, we needed to
> support this capability through the kprobes APIs. So what appears to have
> happened is that the single-step fault and the handler fault have become
> handler through the same mechanism. So the question is: is this
> functionally correct?
> 
> They reason why I doubt that it is correct is that any one using the
> kprobes APIs is going to need a very thorough understanding of the
> motivation behind this interface to use it correctly. I would propose
> separating out the notification of single-stepping (original instruction
> execution) exceptions from self-generated probe-handder exceptions.
> 
> These are some possible options:
> 
> 1)
> leave the page-fault handler to be a called routine that's invoked instead
> of the post-handler when single-step generates an exception.
> create a setjump, longjump mechanism to allow the probe handler to catch
> any self-generated exception but always protect the rest of the system from
> any exception that is not intercepted.
> 
> 2)
> 
> leave the page-fault handler as it - a called routine, but don't invoke it
> for single-step exceptions.
> allow the page-fault handler to deal with creating or simulating a
> setjump/longjump back to the original pre or post handler code.
> pass a switch to the post-handler to indicate whether the single-stepping
> resulted in a exception.
> 
> 3) a combination of 1 & 2, (which I think I prefer)
> dispense with the page-fault handler routine.
> create a setjump/long mechanism for the pre- and post-handler routines to
> use if they wish.
> always shield any unhandled exception that has been generated by the pre-
> and post- handlers from the rest of the system
> indicate via a switch to the post-handler whether the singe-stepping
> resulted in an exception (other than a debug exception of course :-))
> 
> 
> Finally in response to how are things broken:
> 
> Well I've just conducted an experiment on kprobes where I deliberately
> generate a pagefault in the pre-handler and it appears that my pae-fault
> handler is never called, be we do seem to be protecting the system from may
> pagefault exception.
> 
> 
> > (2) We are executing the probe handler, and an instruction in the probe
> >     handler causes a page fault. This could, for example, happen if the
> >     probe handler attempts to access a user-space address by issuing
> >     copy_from_user or get_user etc, and that address is currently
> >     paged out or invalid.
> >
> > The check KPROBE_HIT_SS enables us to distinguish between these two
> cases.
> >
> > If set, it indicates that (1) must have occured. In which case, we want
> > normal page fault handling to continue to that the data accessed by the
> > instruction is paged in, and instruction execution continue after
> > fault handling as would happen in normal execution. However, since
> > this can cause a sleep, other probe hits etc, we turn off kprobes
> > related state (hence the resume execution) before letting it continue.
> Once
> > the data is paged in, the page fault handler would resume execution from
> > the original probed address and thus trap again on the breakpoint
> instruction.
> > So we encounter the probe handler a second time, and this time
> single-stepping
> > would succeed since the address accessed by the instruction is already
> > in memory, and finally we would get a single-step trap and the post
> handler
> > would be executed.
> >
> > In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or at
> > least were) running without interrupts, the default behaviour was to
> > soft-fail the access, i.e. cause the copy_from_user/get_user call in
> > the probe handler to fail with -EFAULT. How does this happen ? The
> > in_atomic() check or its equivalent in do_page_fault succeeds and hence
> > control jumps to bad_area_nosemaphore where it finds that the access
> > happened in kernel mode and hence moves to no_context, where
> > fixup_exception() takes care of the rest.
> > More sophisticated or customised behaviour could be achieved by providing
> > a ->fault_handler(), we could play some neat tricks (like Richard
> mentioned)
> > perform fixups etc.
> >
> > I hope that makes it a little clearer.
> >
> > Could you now help me understand what is broken now, if at all ?
> >
> > Regards
> > Suparna
> >
> > On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote:
> > > On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
> > > > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
> > > > [...]
> > > > > > > And that's it. kprobe_fault_handler returns 0.  No call to
> > > > > > > fixup_exceptions()!  So do_page_fault() will have to do
> > the fixups, but
> > > > > > > first it will print nasty might_sleep warnings and maybe
> > actually sleep!
> > > > >
> > > > > You have the right idea of running fixup_exceptions() in the fault
> > > > > handler, to prevent do_page_fault() from attempting its
> demand-paging
> > > > > stuff.  I just think it's OK to have the user's fault handler,
> rather
> > > > > than kprobe_fault_handler, run fixup_exceptions.
> > > >
> > > > I think if the user's fault handler does not handle the page fault,
> > > > kprobes should attempt to do so. Returning without handling the page
> > > > fault could leave the system in a bad state. So your proposal would
> mean
> > > > it is necessary to specify a fault handler for any kprobe that might
> > > > attempt user-space access or otherwise trigger a page fault,
> otherwise
> > > > the system could crash.
> > >
> > > I just had a long chat with Richard Moore about this whole topic.  I
> > > agree with you on this, and I think Richard would, too.
> > >
> > > So unless there's a user-specified handler and that handler specifies
> > > (by returning 1) that it has handled the exception,
> > > kprobe_fault_handler() should run fixup_exception(), right?
> > >
> > > Now I'm looking, later in that function, at the code (on i386) where we
> > > handle an exception while single-stepping.  I don't think
> > > resume_execution() is the right thing to do here.  We haven't
> > > successfully executed the probed instruction, and the eip still points
> > > at that instruction, right?  I think we're just hosed at this point.
> > > Comments?
> > >
> > > >
> > > > Any more importantly, fixup_exception() and all the exception table
> code
> > > > are currently not exported so cannot be called from a module.
> > >
> > > It's hard to argue with that. :-)
> > >
> > > >
> > > > Martin
> > >
> > > Jim
> > >
> > >
> 

-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-51776329

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-08  4:38         ` Suparna Bhattacharya
@ 2006-02-08 11:32           ` Richard J Moore
  2006-02-09  7:23             ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 33+ messages in thread
From: Richard J Moore @ 2006-02-08 11:32 UTC (permalink / raw)
  To: suparna; +Cc: Martin Hunt, Jim Keniston, SystemTAP






systemtap-owner@sourceware.org wrote on 08/02/2006 04:38:31:

>
> Recalling the way this was supposed to work ... AFAICR
>
> There are 2 main situations where one could land up into
> the page fault handler during kprobe processing:
>
> (1) We have completed the probe handler and are single-stepping the
probed
>     instruction. The original instruction being single-stepped causes a
>     page fault. This would typically happen if that instruction issues a
>     memory access that is paged out.
>

Yes you are correct, the code does do this, but its not supposed to. The
pagefault handler was intended solely to protect the system from the
probe-handler, not to catch faults in the original instruction.

I think what's happened is that two needs have become confused.  Going back
to the original dprobes implementation:

The pagefault callback was there to allow the probe-handler to continue
even if it touched inaccessible memory.
The dprobes interpreter always hooked faults and protected the system from
the probe handler.
On single-stepping the original instruction any exception was percolated up
to the systems, however the log buffer would be discarded by the dprobes
event handler unless logonfault had been specified. This was implemented to
support tracing and to suppress multiple trace entries for tracepoint that
would temporarily fault, then subsequently retry execution successfully.

When we removed the dprobes interpreter to a device drive and created a
minimal kernel API set (krpobes) to support the device driver, we needed to
support this capability through the kprobes APIs. So what appears to have
happened is that the single-step fault and the handler fault have become
handler through the same mechanism. So the question is: is this
functionally correct?

They reason why I doubt that it is correct is that any one using the
kprobes APIs is going to need a very thorough understanding of the
motivation behind this interface to use it correctly. I would propose
separating out the notification of single-stepping (original instruction
execution) exceptions from self-generated probe-handder exceptions.

These are some possible options:

1)
leave the page-fault handler to be a called routine that's invoked instead
of the post-handler when single-step generates an exception.
create a setjump, longjump mechanism to allow the probe handler to catch
any self-generated exception but always protect the rest of the system from
any exception that is not intercepted.

2)

leave the page-fault handler as it - a called routine, but don't invoke it
for single-step exceptions.
allow the page-fault handler to deal with creating or simulating a
setjump/longjump back to the original pre or post handler code.
pass a switch to the post-handler to indicate whether the single-stepping
resulted in a exception.

3) a combination of 1 & 2, (which I think I prefer)
dispense with the page-fault handler routine.
create a setjump/long mechanism for the pre- and post-handler routines to
use if they wish.
always shield any unhandled exception that has been generated by the pre-
and post- handlers from the rest of the system
indicate via a switch to the post-handler whether the singe-stepping
resulted in an exception (other than a debug exception of course :-))


Finally in response to how are things broken:

Well I've just conducted an experiment on kprobes where I deliberately
generate a pagefault in the pre-handler and it appears that my pae-fault
handler is never called, be we do seem to be protecting the system from may
pagefault exception.


> (2) We are executing the probe handler, and an instruction in the probe
>     handler causes a page fault. This could, for example, happen if the
>     probe handler attempts to access a user-space address by issuing
>     copy_from_user or get_user etc, and that address is currently
>     paged out or invalid.
>
> The check KPROBE_HIT_SS enables us to distinguish between these two
cases.
>
> If set, it indicates that (1) must have occured. In which case, we want
> normal page fault handling to continue to that the data accessed by the
> instruction is paged in, and instruction execution continue after
> fault handling as would happen in normal execution. However, since
> this can cause a sleep, other probe hits etc, we turn off kprobes
> related state (hence the resume execution) before letting it continue.
Once
> the data is paged in, the page fault handler would resume execution from
> the original probed address and thus trap again on the breakpoint
instruction.
> So we encounter the probe handler a second time, and this time
single-stepping
> would succeed since the address accessed by the instruction is already
> in memory, and finally we would get a single-step trap and the post
handler
> would be executed.
>
> In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or at
> least were) running without interrupts, the default behaviour was to
> soft-fail the access, i.e. cause the copy_from_user/get_user call in
> the probe handler to fail with -EFAULT. How does this happen ? The
> in_atomic() check or its equivalent in do_page_fault succeeds and hence
> control jumps to bad_area_nosemaphore where it finds that the access
> happened in kernel mode and hence moves to no_context, where
> fixup_exception() takes care of the rest.
> More sophisticated or customised behaviour could be achieved by providing
> a ->fault_handler(), we could play some neat tricks (like Richard
mentioned)
> perform fixups etc.
>
> I hope that makes it a little clearer.
>
> Could you now help me understand what is broken now, if at all ?
>
> Regards
> Suparna
>
> On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote:
> > On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
> > > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
> > > [...]
> > > > > > And that's it. kprobe_fault_handler returns 0.  No call to
> > > > > > fixup_exceptions()!  So do_page_fault() will have to do
> the fixups, but
> > > > > > first it will print nasty might_sleep warnings and maybe
> actually sleep!
> > > >
> > > > You have the right idea of running fixup_exceptions() in the fault
> > > > handler, to prevent do_page_fault() from attempting its
demand-paging
> > > > stuff.  I just think it's OK to have the user's fault handler,
rather
> > > > than kprobe_fault_handler, run fixup_exceptions.
> > >
> > > I think if the user's fault handler does not handle the page fault,
> > > kprobes should attempt to do so. Returning without handling the page
> > > fault could leave the system in a bad state. So your proposal would
mean
> > > it is necessary to specify a fault handler for any kprobe that might
> > > attempt user-space access or otherwise trigger a page fault,
otherwise
> > > the system could crash.
> >
> > I just had a long chat with Richard Moore about this whole topic.  I
> > agree with you on this, and I think Richard would, too.
> >
> > So unless there's a user-specified handler and that handler specifies
> > (by returning 1) that it has handled the exception,
> > kprobe_fault_handler() should run fixup_exception(), right?
> >
> > Now I'm looking, later in that function, at the code (on i386) where we
> > handle an exception while single-stepping.  I don't think
> > resume_execution() is the right thing to do here.  We haven't
> > successfully executed the probed instruction, and the eip still points
> > at that instruction, right?  I think we're just hosed at this point.
> > Comments?
> >
> > >
> > > Any more importantly, fixup_exception() and all the exception table
code
> > > are currently not exported so cannot be called from a module.
> >
> > It's hard to argue with that. :-)
> >
> > >
> > > Martin
> >
> > Jim
> >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-07 19:49       ` Jim Keniston
@ 2006-02-08  4:38         ` Suparna Bhattacharya
  2006-02-08 11:32           ` Richard J Moore
  0 siblings, 1 reply; 33+ messages in thread
From: Suparna Bhattacharya @ 2006-02-08  4:38 UTC (permalink / raw)
  To: Jim Keniston; +Cc: Martin Hunt, SystemTAP


Recalling the way this was supposed to work ... AFAICR

There are 2 main situations where one could land up into
the page fault handler during kprobe processing:

(1) We have completed the probe handler and are single-stepping the probed
    instruction. The original instruction being single-stepped causes a
    page fault. This would typically happen if that instruction issues a
    memory access that is paged out.

(2) We are executing the probe handler, and an instruction in the probe
    handler causes a page fault. This could, for example, happen if the
    probe handler attempts to access a user-space address by issuing
    copy_from_user or get_user etc, and that address is currently
    paged out or invalid.

The check KPROBE_HIT_SS enables us to distinguish between these two cases.

If set, it indicates that (1) must have occured. In which case, we want
normal page fault handling to continue to that the data accessed by the
instruction is paged in, and instruction execution continue after
fault handling as would happen in normal execution. However, since
this can cause a sleep, other probe hits etc, we turn off kprobes
related state (hence the resume execution) before letting it continue. Once
the data is paged in, the page fault handler would resume execution from
the original probed address and thus trap again on the breakpoint instruction.
So we encounter the probe handler a second time, and this time single-stepping
would succeed since the address accessed by the instruction is already
in memory, and finally we would get a single-step trap and the post handler
would be executed. 

In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or at
least were) running without interrupts, the default behaviour was to
soft-fail the access, i.e. cause the copy_from_user/get_user call in
the probe handler to fail with -EFAULT. How does this happen ? The
in_atomic() check or its equivalent in do_page_fault succeeds and hence
control jumps to bad_area_nosemaphore where it finds that the access
happened in kernel mode and hence moves to no_context, where
fixup_exception() takes care of the rest.
More sophisticated or customised behaviour could be achieved by providing
a ->fault_handler(), we could play some neat tricks (like Richard mentioned)
perform fixups etc.

I hope that makes it a little clearer.

Could you now help me understand what is broken now, if at all ?

Regards
Suparna

On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote:
> On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
> > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
> > [...]
> > > > > And that's it. kprobe_fault_handler returns 0.  No call to
> > > > > fixup_exceptions()!  So do_page_fault() will have to do the fixups, but
> > > > > first it will print nasty might_sleep warnings and maybe actually sleep!
> > > 
> > > You have the right idea of running fixup_exceptions() in the fault
> > > handler, to prevent do_page_fault() from attempting its demand-paging
> > > stuff.  I just think it's OK to have the user's fault handler, rather
> > > than kprobe_fault_handler, run fixup_exceptions.
> > 
> > I think if the user's fault handler does not handle the page fault,
> > kprobes should attempt to do so. Returning without handling the page
> > fault could leave the system in a bad state. So your proposal would mean
> > it is necessary to specify a fault handler for any kprobe that might
> > attempt user-space access or otherwise trigger a page fault, otherwise
> > the system could crash.
> 
> I just had a long chat with Richard Moore about this whole topic.  I
> agree with you on this, and I think Richard would, too.
> 
> So unless there's a user-specified handler and that handler specifies
> (by returning 1) that it has handled the exception,
> kprobe_fault_handler() should run fixup_exception(), right?
> 
> Now I'm looking, later in that function, at the code (on i386) where we
> handle an exception while single-stepping.  I don't think
> resume_execution() is the right thing to do here.  We haven't
> successfully executed the probed instruction, and the eip still points
> at that instruction, right?  I think we're just hosed at this point. 
> Comments?
> 
> > 
> > Any more importantly, fixup_exception() and all the exception table code
> > are currently not exported so cannot be called from a module.
> 
> It's hard to argue with that. :-)
> 
> > 
> > Martin
> 
> Jim
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: kprobe fault handling
@ 2006-02-07 22:19 Keshavamurthy, Anil S
  0 siblings, 0 replies; 33+ messages in thread
From: Keshavamurthy, Anil S @ 2006-02-07 22:19 UTC (permalink / raw)
  To: Martin Hunt; +Cc: Jim Keniston, systemtap

>Anil. Did you read this thread, starting at
>http://sources.redhat.com/ml/systemtap/2006-q1/msg00392.html
>
>How will removing page fault handling fix our broken page fault
>handling?
>
I am just removing the broken implementation of kprobe page fault
handling for now till we have a clean solution.
At least this will give some performance boost to distro kernel without
any feature loss :-)

-Anil

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: kprobe fault handling
  2006-02-07 20:36 Keshavamurthy, Anil S
@ 2006-02-07 20:48 ` Martin Hunt
  0 siblings, 0 replies; 33+ messages in thread
From: Martin Hunt @ 2006-02-07 20:48 UTC (permalink / raw)
  To: Keshavamurthy, Anil S; +Cc: Jim Keniston, systemtap

On Tue, 2006-02-07 at 12:35 -0800, Keshavamurthy, Anil S wrote:
> >I just had a long chat with Richard Moore about this whole topic.  I
> >agree with you on this, and I think Richard would, too.
> >
> >So unless there's a user-specified handler and that handler specifies
> >(by returning 1) that it has handled the exception,
> >kprobe_fault_handler() should run fixup_exception(), right?
> >
> >Now I'm looking, later in that function, at the code (on i386) where we
> >handle an exception while single-stepping.  I don't think
> >resume_execution() is the right thing to do here.  We haven't
> >successfully executed the probed instruction, and the eip still points
> >at that instruction, right?  I think we're just hosed at this point. 
> >Comments?
> I agree with your comments and we need a better fix. 
> Currently for RHEL4 release I am inclined to remove 
> DIE_PAGE_FAULT switch case as this at least improves 
> the performance.

Anil. Did you read this thread, starting at
http://sources.redhat.com/ml/systemtap/2006-q1/msg00392.html

How will removing page fault handling fix our broken page fault
handling?

Martin



^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: kprobe fault handling
@ 2006-02-07 20:36 Keshavamurthy, Anil S
  2006-02-07 20:48 ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Keshavamurthy, Anil S @ 2006-02-07 20:36 UTC (permalink / raw)
  To: Jim Keniston, Martin Hunt; +Cc: SystemTAP

>I just had a long chat with Richard Moore about this whole topic.  I
>agree with you on this, and I think Richard would, too.
>
>So unless there's a user-specified handler and that handler specifies
>(by returning 1) that it has handled the exception,
>kprobe_fault_handler() should run fixup_exception(), right?
>
>Now I'm looking, later in that function, at the code (on i386) where we
>handle an exception while single-stepping.  I don't think
>resume_execution() is the right thing to do here.  We haven't
>successfully executed the probed instruction, and the eip still points
>at that instruction, right?  I think we're just hosed at this point. 
>Comments?
I agree with your comments and we need a better fix. 
Currently for RHEL4 release I am inclined to remove 
DIE_PAGE_FAULT switch case as this at least improves 
the performance.

-Anil

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-07 17:50     ` Martin Hunt
@ 2006-02-07 19:49       ` Jim Keniston
  2006-02-08  4:38         ` Suparna Bhattacharya
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Keniston @ 2006-02-07 19:49 UTC (permalink / raw)
  To: Martin Hunt; +Cc: SystemTAP

On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
> On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
> [...]
> > > > And that's it. kprobe_fault_handler returns 0.  No call to
> > > > fixup_exceptions()!  So do_page_fault() will have to do the fixups, but
> > > > first it will print nasty might_sleep warnings and maybe actually sleep!
> > 
> > You have the right idea of running fixup_exceptions() in the fault
> > handler, to prevent do_page_fault() from attempting its demand-paging
> > stuff.  I just think it's OK to have the user's fault handler, rather
> > than kprobe_fault_handler, run fixup_exceptions.
> 
> I think if the user's fault handler does not handle the page fault,
> kprobes should attempt to do so. Returning without handling the page
> fault could leave the system in a bad state. So your proposal would mean
> it is necessary to specify a fault handler for any kprobe that might
> attempt user-space access or otherwise trigger a page fault, otherwise
> the system could crash.

I just had a long chat with Richard Moore about this whole topic.  I
agree with you on this, and I think Richard would, too.

So unless there's a user-specified handler and that handler specifies
(by returning 1) that it has handled the exception,
kprobe_fault_handler() should run fixup_exception(), right?

Now I'm looking, later in that function, at the code (on i386) where we
handle an exception while single-stepping.  I don't think
resume_execution() is the right thing to do here.  We haven't
successfully executed the probed instruction, and the eip still points
at that instruction, right?  I think we're just hosed at this point. 
Comments?

> 
> Any more importantly, fixup_exception() and all the exception table code
> are currently not exported so cannot be called from a module.

It's hard to argue with that. :-)

> 
> Martin

Jim


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-07 17:31   ` Jim Keniston
@ 2006-02-07 17:50     ` Martin Hunt
  2006-02-07 19:49       ` Jim Keniston
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-07 17:50 UTC (permalink / raw)
  To: Jim Keniston; +Cc: systemtap

On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
[...]
> > > And that's it. kprobe_fault_handler returns 0.  No call to
> > > fixup_exceptions()!  So do_page_fault() will have to do the fixups, but
> > > first it will print nasty might_sleep warnings and maybe actually sleep!
> 
> You have the right idea of running fixup_exceptions() in the fault
> handler, to prevent do_page_fault() from attempting its demand-paging
> stuff.  I just think it's OK to have the user's fault handler, rather
> than kprobe_fault_handler, run fixup_exceptions.

I think if the user's fault handler does not handle the page fault,
kprobes should attempt to do so. Returning without handling the page
fault could leave the system in a bad state. So your proposal would mean
it is necessary to specify a fault handler for any kprobe that might
attempt user-space access or otherwise trigger a page fault, otherwise
the system could crash.

Any more importantly, fixup_exception() and all the exception table code
are currently not exported so cannot be called from a module.

Martin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-07  0:51 ` Jim Keniston
@ 2006-02-07 17:31   ` Jim Keniston
  2006-02-07 17:50     ` Martin Hunt
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Keniston @ 2006-02-07 17:31 UTC (permalink / raw)
  To: Martin Hunt; +Cc: SystemTAP

On Mon, 2006-02-06 at 16:50, Jim Keniston wrote:
> Hi, Martin.  This is the subject of bugzilla #1303, which is assigned to
> Prasanna.  I believe that Prasanna is also the author of the original
> kprobe_fault_handler code in kprobes, so he may be able to provide
> insight that I don't have.
> 
> However, I've thought about this some, so here goes...

After rethinking this, I believe that no change of kprobes is
necessary.  If the pre- or post-handler does user-space accesses using
functions that include fixups, then it should be sufficient for the
corresponding fault-handler to call fixup_exception().  Details below.

> 
> On Mon, 2006-02-06 at 11:50, Martin Hunt wrote:
> > I've been trying to understand how kprobes fault handling is supposed to
> > work and why it isn't doing what I thought it did.
> > 
[Snipped some analysis from Martin, which I still believe to be
correct.]
> 
> > Question: What
> > do we imagine a probe-specific page fault handler would do?  Why is it
> > useful?
> 
> If an attempted memory access failed, the fault handler might be able to
> clean up what the pre_handler or post_handler was trying to do.
> 
> For example, consider user-space return probes (which is currently hung
> up on a couple of issues, including user-space access).  When we enter
> the probed function, the uretprobe version of arch_prepare_kretprobe()
> has to do the following:
> 1. Reserve a kretprobe_instance.
> 2. Save a copy of the return address (a read from user space).
> 3. Replace the return address with the uretprobe trampoline address (a
> write to user space).
> 4. Hash the kretprobe_instance.
> If the page containing the return address is not in memory (a very
> unlikely scenario, but one which I believe we must handle) then we'll
> get a page fault on #2 or #3.

OK so far.

> The corresponding fault handler could
> free up the kretprobe_instance reserved in step #1.

No.  The fault handler should look something like this:
int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
	if (trapnr == 14) {	/* Fix this -- i386-specific */
		/* page fault */
		return fixup_exception(regs);
	}
	return 0;
}

fixup_exceptions will adjust regs->eip to the code in the
user-read/write function that returns -EFAULT.  Returning normally from
the page fault will leave us right where we want to be.

So in steps 2-3 from arch_prepare_uretprobe above, on each call to the
appropriate user-space access function (perhaps __get_user() for step
2), I'd just check to see if it returns -EFAULT.

[Snipped some stuff that I now believe to be bogus.]

> 
> I understand what fixup_exceptions() does, but it's not clear to me how
> we can use it here.

It's clearer to me now, I think. :-}

> I guess if all user-space access by kprobes
> handlers used the same user-read and user-write functions, then we'd
> have a fixup for the user-read function and one for the user-write, and
> each of these could vector into the aforementioned longjmp.

Well, no, the fixups would vector to code that returns -EFAULT, as most
such fixups do.

> 
> Do you envision that all user-space accesses would be via functions like
> *_from_user() in the SystemTap runtime?

We should reconsider using user-space access functions that return
-EFAULT when the page isn't resident.

> 
> > 
> > Then there is this code, which I don't understand
> > 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
> > 		resume_execution(cur, regs, kcb);
> > 		regs->eflags |= kcb->kprobe_old_eflags;
> > 
> > 		reset_current_kprobe();
> > 		preempt_enable_no_resched();
> > 	}
> > 
> 
> I think KPROBE_HIT_SS means that the fault happened while the probed
> instruction was being single-stepped.

Still correct, I think.

> Beyond that, I'm not sure what's
> going on.

I think I do now.

> 
> > And that's it. kprobe_fault_handler returns 0.  No call to
> > fixup_exceptions()!  So do_page_fault() will have to do the fixups, but
> > first it will print nasty might_sleep warnings and maybe actually sleep!

You have the right idea of running fixup_exceptions() in the fault
handler, to prevent do_page_fault() from attempting its demand-paging
stuff.  I just think it's OK to have the user's fault handler, rather
than kprobe_fault_handler, run fixup_exceptions.

> > 
> > I could have sworn this was not the case previously but it has been a
> > very long time since I have looked at the code at this level.  Anyway,
> > this MUST be fixed. 
> > 
> > Martin
> 
> I agree that we need to resolve this.  Thanks for getting this
> discussion rolling again.
> 
> Jim

Jim again

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2006-02-06 19:49 Martin Hunt
@ 2006-02-07  0:51 ` Jim Keniston
  2006-02-07 17:31   ` Jim Keniston
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Keniston @ 2006-02-07  0:51 UTC (permalink / raw)
  To: Martin Hunt; +Cc: SystemTAP

Hi, Martin.  This is the subject of bugzilla #1303, which is assigned to
Prasanna.  I believe that Prasanna is also the author of the original
kprobe_fault_handler code in kprobes, so he may be able to provide
insight that I don't have.

However, I've thought about this some, so here goes...

On Mon, 2006-02-06 at 11:50, Martin Hunt wrote:
> I've been trying to understand how kprobes fault handling is supposed to
> work and why it isn't doing what I thought it did.
> 
> When page faults happen, do_page_fault() almost immediately calls
> notify_die(DIE_PAGE_FAULT,...) This calls the notifier chain which calls
> kprobe_exceptions_notify(). This calls kprobe_fault_handler().
> 
> kprobe_fault_handler() checkes to see if there is a specific fault
> fandler for that kprobe, and if there is, it calls it.

I believe that your analysis is correct.

> Question: What
> do we imagine a probe-specific page fault handler would do?  Why is it
> useful?

If an attempted memory access failed, the fault handler might be able to
clean up what the pre_handler or post_handler was trying to do.

For example, consider user-space return probes (which is currently hung
up on a couple of issues, including user-space access).  When we enter
the probed function, the uretprobe version of arch_prepare_kretprobe()
has to do the following:
1. Reserve a kretprobe_instance.
2. Save a copy of the return address (a read from user space).
3. Replace the return address with the uretprobe trampoline address (a
write to user space).
4. Hash the kretprobe_instance.
If the page containing the return address is not in memory (a very
unlikely scenario, but one which I believe we must handle) then we'll
get a page fault on #2 or #3.  The corresponding fault handler could
free up the kretprobe_instance reserved in step #1.

Of course, the obvious question is, what does the fault handler do
then?  We don't want to return as if the memory access were successful,
because it wasn't.  We don't want to let the page be demand-paged back
in, because that might cause us to sleep -- a definite no-no.  But those
seem to be the only two possible outcomes, given the way kprobes
currently works.

The "right" thing to do, perhaps, is for kprobe_handler to notice that
the kprobe in question has a fault handler is associated with it, and do
a sort of setjmp before calling the pre_handler.  Then the fault handler
could end in a corresponding longjmp.  (The longjmp would bypass several
stack frames, including those of the fault handler,
kprobe_fault_handler, kprobe_exceptions_notify, notify_die and friends,
do_page_fault, and the pre_handler.)

I understand what fixup_exceptions() does, but it's not clear to me how
we can use it here.  I guess if all user-space access by kprobes
handlers used the same user-read and user-write functions, then we'd
have a fixup for the user-read function and one for the user-write, and
each of these could vector into the aforementioned longjmp.

Do you envision that all user-space accesses would be via functions like
*_from_user() in the SystemTap runtime?

> 
> Then there is this code, which I don't understand
> 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
> 		resume_execution(cur, regs, kcb);
> 		regs->eflags |= kcb->kprobe_old_eflags;
> 
> 		reset_current_kprobe();
> 		preempt_enable_no_resched();
> 	}
> 

I think KPROBE_HIT_SS means that the fault happened while the probed
instruction was being single-stepped.  Beyond that, I'm not sure what's
going on.

> And that's it. kprobe_fault_handler returns 0.  No call to
> fixup_exceptions()!  So do_page_fault() will have to do the fixups, but
> first it will print nasty might_sleep warnings and maybe actually sleep!
> 
> I could have sworn this was not the case previously but it has been a
> very long time since I have looked at the code at this level.  Anyway,
> this MUST be fixed. 
> 
> Martin

I agree that we need to resolve this.  Thanks for getting this
discussion rolling again.

Jim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* kprobe fault handling
@ 2006-02-06 19:49 Martin Hunt
  2006-02-07  0:51 ` Jim Keniston
  0 siblings, 1 reply; 33+ messages in thread
From: Martin Hunt @ 2006-02-06 19:49 UTC (permalink / raw)
  To: systemtap

I've been trying to understand how kprobes fault handling is supposed to
work and why it isn't doing what I thought it did.

When page faults happen, do_page_fault() almost immediately calls
notify_die(DIE_PAGE_FAULT,...) This calls the notifier chain which calls
kprobe_exceptions_notify(). This calls kprobe_fault_handler().

kprobe_fault_handler() checkes to see if there is a specific fault
fandler for that kprobe, and if there is, it calls it.  Question: What
do we imagine a probe-specific page fault handler would do?  Why is it
useful?

Then there is this code, which I don't understand
	if (kcb->kprobe_status & KPROBE_HIT_SS) {
		resume_execution(cur, regs, kcb);
		regs->eflags |= kcb->kprobe_old_eflags;

		reset_current_kprobe();
		preempt_enable_no_resched();
	}

And that's it. kprobe_fault_handler returns 0.  No call to
fixup_exceptions()!  So do_page_fault() will have to do the fixups, but
first it will print nasty might_sleep warnings and maybe actually sleep!

I could have sworn this was not the case previously but it has been a
very long time since I have looked at the code at this level.  Anyway,
this MUST be fixed. 

Martin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2005-09-08 11:52   ` Prasanna S Panchamukhi
@ 2005-09-08 17:42     ` Frank Ch. Eigler
  0 siblings, 0 replies; 33+ messages in thread
From: Frank Ch. Eigler @ 2005-09-08 17:42 UTC (permalink / raw)
  To: Prasanna S Panchamukhi; +Cc: systemtap

Hi -

> [...]
> > Is there a modules-accessible kprobes API for disarming a probe on the
> > fly?
> 
> There is a arch specific routine to disable the probes as shown below.
> void arch_disarm_kprobe(struct kprobe *p) [...]
> Is it enough to export the above routine in all architectures?

I don't know.  We'd need some plain kprobes tests to see what happens
if we "return 1" from a fault_handler that does such a disarm.  Might
the processor return to the fault-causing instruction in the buggy
pre_handler?

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2005-09-06 15:09 ` Frank Ch. Eigler
@ 2005-09-08 11:52   ` Prasanna S Panchamukhi
  2005-09-08 17:42     ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Prasanna S Panchamukhi @ 2005-09-08 11:52 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Hi Frank,

On Tue, Sep 06, 2005 at 11:09:13AM -0400, Frank Ch. Eigler wrote:
> 
> prasanna wrote:
> 
> > [...]  yes, in situtations when fault_handler is executed, users can
> > either correct the faulty instruction and singlestep again or he can
> > just replace back the original instruction, disarm the probe and
> > continue.  [...]
> 
> Is there a modules-accessible kprobes API for disarming a probe on the
> fly?

There is a arch specific routine to disable the probes as shown below.
void arch_disarm_kprobe(struct kprobe *p)
{
	*p->addr = p->opcode;
	flush_icache_range((unsigned long) p->addr,
			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
}


> For example, if a probe handler does something crazy and branches to
> pc=0, I'd like to unwind/disable the handler, signal the situation,
> and let the kernel run along as if the probe wasn't there in the first
> place.  It looks like this may not be all there now, so I will disable
> the new code in the translator code.

Is it enough to export the above routine in all architectures?


Thanks
Prasanna

-- 

Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Ph: 91-80-25044636
<prasanna@in.ibm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
  2005-09-06 12:56 Prasanna S Panchamukhi
@ 2005-09-06 15:09 ` Frank Ch. Eigler
  2005-09-08 11:52   ` Prasanna S Panchamukhi
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Ch. Eigler @ 2005-09-06 15:09 UTC (permalink / raw)
  To: prasanna; +Cc: systemtap


prasanna wrote:

> [...]  yes, in situtations when fault_handler is executed, users can
> either correct the faulty instruction and singlestep again or he can
> just replace back the original instruction, disarm the probe and
> continue.  [...]

Is there a modules-accessible kprobes API for disarming a probe on the
fly?

One thing perhaps I didn't make clear is that I am hoping to use the
kprobes fault_handler hook in order to catch problems that occur
*during* the pre_handler execution.  In other words, it is meant as
additional protection for bugs in tapsets, the translator, whatnot.

For example, if a probe handler does something crazy and branches to
pc=0, I'd like to unwind/disable the handler, signal the situation,
and let the kernel run along as if the probe wasn't there in the first
place.  It looks like this may not be all there now, so I will disable
the new code in the translator code.

- FChE

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: kprobe fault handling
@ 2005-09-06 12:56 Prasanna S Panchamukhi
  2005-09-06 15:09 ` Frank Ch. Eigler
  0 siblings, 1 reply; 33+ messages in thread
From: Prasanna S Panchamukhi @ 2005-09-06 12:56 UTC (permalink / raw)
  To: fche; +Cc: systemtap

Hi Frank,

>I'm about to commit some translator code to set a fault_handler for
>systemtap kprobes, as additional safety protection.  To test it, I've
>tried the following little crashme script, which as you see
>dereferences NULL.

>  function ta () %{ * (volatile char *) 0 = 0; %}
>  probe kernel.SOMEPLACE { ta () }


>I'm getting far enough along with this that the fault handler is
>indeed run, and a nice little message is sent via printk().  However,
>what I'd like to do is signal that the probe handler should more or
>less resume (and soon unwind and clean itself up with an error).

yes, in situtations when fault_handler is executed, users can either
correct the faulty instruction and singlestep again or he can just
replace back the original instruction, disarm the probe and continue.
That will eventually trigger a kernel panic as if there was no probe
at that location.


>A "return 0" seems inappropriate, since this would trigger a kernel
>panic.  A "return 1" seems not to work the way I expect, in that in my
>tests, this gets us into an infinite loop, as if it was retrying the
>NULL dereference every time.

Kprobes code does not handle the fault but just provides an opportunity
to correct by giving control to registered fault handler.

>What can I do from the fault handler to make sure that pre_handler
>execution is resumed, but the faulting instruction is *skipped*?

Skipping the instruction works in your testcase, but may not work
for other situations. Also to skip the instruction, you may need
to know the instruction length. In some architectures getting the
instruction length is easy like ppc64, but might not be easy in 
other architectures.

Also in such situations instead of just skipping the faulty instruction,
you can skip the whole routine, but that might trigger kernel panic
somewhere else.

>Actually I don't even need the pre_handler to resume running, just the
>kernel in general (single-stepping over the probed insn, and so on).
>Is there a way to accomplish that?

Yes, it can done..by changing the instruction pointer to the singlestep
instruction and also setting the flags for single stepping. But if user fault 
handler would have corrected the faulted instruction, resuming with 
pre_handler() should also work. Am I missing something here????


Thanks
Prasanna
-- 

Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Ph: 91-80-25044636
<prasanna@in.ibm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2006-02-13 13:39 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-06  1:06 kprobe fault handling Frank Ch. Eigler
2005-09-06 12:56 Prasanna S Panchamukhi
2005-09-06 15:09 ` Frank Ch. Eigler
2005-09-08 11:52   ` Prasanna S Panchamukhi
2005-09-08 17:42     ` Frank Ch. Eigler
2006-02-06 19:49 Martin Hunt
2006-02-07  0:51 ` Jim Keniston
2006-02-07 17:31   ` Jim Keniston
2006-02-07 17:50     ` Martin Hunt
2006-02-07 19:49       ` Jim Keniston
2006-02-08  4:38         ` Suparna Bhattacharya
2006-02-08 11:32           ` Richard J Moore
2006-02-09  7:23             ` Prasanna S Panchamukhi
2006-02-09 16:33               ` Keshavamurthy Anil S
2006-02-09 21:35               ` Jim Keniston
2006-02-09 22:06                 ` Martin Hunt
2006-02-10  5:39                   ` Martin Hunt
2006-02-10 21:46                     ` Frank Ch. Eigler
2006-02-10 21:55                       ` Martin Hunt
2006-02-10 22:12                         ` Frank Ch. Eigler
2006-02-10 22:17                           ` Martin Hunt
2006-02-10 22:20                             ` Frank Ch. Eigler
2006-02-10 22:41                               ` Martin Hunt
2006-02-10 22:47                                 ` Frank Ch. Eigler
2006-02-10 23:36                                   ` Martin Hunt
2006-02-11  0:49                                     ` Frank Ch. Eigler
2006-02-12  1:26                                       ` Martin Hunt
2006-02-13 13:39                                         ` Frank Ch. Eigler
2006-02-07 20:36 Keshavamurthy, Anil S
2006-02-07 20:48 ` Martin Hunt
2006-02-07 22:19 Keshavamurthy, Anil S
2006-02-09  8:55 Mao, Bibo
2006-02-09 10:22 ` Richard J Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).