From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-2456-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 2895 invoked by alias); 9 Feb 2006 08:55:12 -0000
Received: (qmail 2872 invoked by uid 22791); 9 Feb 2006 08:55:11 -0000
X-Spam-Status: No, hits=-1.9 required=5.0 	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from fmr18.intel.com (HELO orsfmr003.jf.intel.com) (134.134.136.17)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 09 Feb 2006 08:55:07 +0000
Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) 	by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id k198su8o021615; 	Thu, 9 Feb 2006 08:54:56 GMT
Received: from pdsmsxvs01.pd.intel.com (pdsmsxvs01.pd.intel.com [172.16.12.122]) 	by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id k198srIn019805; 	Thu, 9 Feb 2006 08:54:53 GMT
Received: from pdsmsx331.ccr.corp.intel.com ([172.16.12.58])  by pdsmsxvs01.pd.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2006020916545114862  ; Thu, 09 Feb 2006 16:54:51 +0800
Received: from pdsmsx405.ccr.corp.intel.com ([172.16.12.95]) by pdsmsx331.ccr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); 	 Thu, 9 Feb 2006 16:54:51 +0800
X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; 	charset="gb2312"
Content-Transfer-Encoding: quoted-printable
Subject: RE: kprobe fault handling
Date: Thu, 09 Feb 2006 08:55:00 -0000
Message-ID: <9FBCE015AF479F46B3B410499F3AE05B0898D3@pdsmsx405>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: kprobe fault handling
Thread-Index: AcYtSf8KZ6gDcRzeS3CZKihx77+zvwACOCBQ
From: "Mao, Bibo" <bibo.mao@intel.com>
To: <prasanna@in.ibm.com>, "Richard J Moore" <richardj_moore@uk.ibm.com>
Cc: <suparna@in.ibm.com>, "Martin Hunt" <hunt@redhat.com>,         "Jim Keniston" <jkenisto@us.ibm.com>,         "SystemTAP" <systemtap@sources.redhat.com>
X-OriginalArrivalTime: 09 Feb 2006 08:54:51.0953 (UTC) FILETIME=[7D50EA10:01C62D56]
X-Scanned-By: MIMEDefang 2.52 on 10.7.209.17
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2006-q1/txt/msg00450.txt.bz2

When instruction being single stepped causes page fault, resetting current =
kprobe will cause pre_handle function to execute twice. And I think main ai=
m of set eip point to probed ip is for fixup_exception(regs) in do_page_fau=
lt function. If probed instruction try to access memory not in VMA table, i=
t will call fixup_exception(regs) to fix it, it will search exception table=
 and set eip to fixup instruction address.
When single stepped instruction cause page fault, eip will point to kprobe =
copied address, but not probed address, so execution table search will fail.
And I do not whether there is good way to avoid kprobe execution twice and =
jump to fixup code when single stepped instruction cause page pault.

bibo,mao

>-----Original Message-----
>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.or=
g]
>On Behalf Of Prasanna S Panchamukhi
>Sent: 2006=C4=EA2=D4=C29=C8=D5 15:24
>To: Richard J Moore
>Cc: suparna@in.ibm.com; Martin Hunt; Jim Keniston; SystemTAP
>Subject: Re: kprobe fault handling
>
>Hi,
>
>Please find the patch below to provide robust kprobe fault handling.
>Presently this patch is for i386 architecture, will port to other
>architectures if this approach looks ok. This patch is currently
>untested, appreciate any feedback.
>
>Thanks
>Prasanna
>
>This patch provides proper kprobes fault handling, if a user specified pre=
/post
>handlers tires to access user address space, through copy_from_user(),
>get_user() etc. The user-specified handler gets called only if the fault
>occurs due to kprobes handlers and if the user-specified handler does
>not fix it, we try to fix it by calling fix_exception().
>The user specified handler will not be called if the fault happens when
>single stepping the original instruction.
>
>Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>
>
> arch/i386/kernel/kprobes.c |   32 +++++++++++++++++++++++++++++---
> 1 files changed, 29 insertions(+), 3 deletions(-)
>
>diff -puN arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix
>arch/i386/kernel/kprobes.c
>---
>linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix
>	2006-02-09 12:50:44.000000000 +0530
>+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c	2006-02-09
>12:50:44.000000000 +0530
>@@ -543,15 +543,41 @@ static inline int kprobe_fault_handler(s
> 	struct kprobe *cur =3D kprobe_running();
> 	struct kprobe_ctlblk *kcb =3D get_kprobe_ctlblk();
>
>-	if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
>-		return 1;
>-
> 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
>+		/*
>+		 * We are here because the instruction being single stepped
>+		 * caused a page fault. We reset the current kprobe and the
>+		 * eip points back to the probe address and allow the page
>+		 * fault handler.
>+		 */
> 		resume_execution(cur, regs, kcb);
> 		regs->eflags |=3D kcb->kprobe_old_eflags;
>
> 		reset_current_kprobe();
> 		preempt_enable_no_resched();
>+	} else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
>+			(kcb-kprobe_status & KPROBE_HIT_ACTIVE)) {
>+		/*
>+		 * We come here because instructions in the pre/post handler
>+		 * caused the page_fault, this could happen if handler tries
>+		 * to access user space by copy_from_user(), get_user() etc.
>+		 * Let the user-specified handler try to fix it first.
>+		 */
>+		if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
>+			return 1;
>+
>+		/*
>+		 * In case the user-specified fault handler returned zero,
>+		 * try to fix up.
>+		 */
>+
>+		if (fixup_exception(regs))
>+			return 1;
>+
>+		/*
>+		 * fixup_exception() could not handle it,
>+		 * Let do_page_fault() fix it.
>+		 */
> 	}
> 	return 0;
> }
>
>_
>
>On Wed, Feb 08, 2006 at 11:27:53AM +0000, Richard J Moore wrote:
>>
>>
>>
>>
>>
>> systemtap-owner@sourceware.org wrote on 08/02/2006 04:38:31:
>>
>> >
>> > Recalling the way this was supposed to work ... AFAICR
>> >
>> > There are 2 main situations where one could land up into
>> > the page fault handler during kprobe processing:
>> >
>> > (1) We have completed the probe handler and are single-stepping the
>> probed
>> >     instruction. The original instruction being single-stepped causes a
>> >     page fault. This would typically happen if that instruction issues=
 a
>> >     memory access that is paged out.
>> >
>>
>> Yes you are correct, the code does do this, but its not supposed to. The
>> pagefault handler was intended solely to protect the system from the
>> probe-handler, not to catch faults in the original instruction.
>>
>> I think what's happened is that two needs have become confused.  Going b=
ack
>> to the original dprobes implementation:
>>
>> The pagefault callback was there to allow the probe-handler to continue
>> even if it touched inaccessible memory.
>> The dprobes interpreter always hooked faults and protected the system fr=
om
>> the probe handler.
>> On single-stepping the original instruction any exception was percolated=
 up
>> to the systems, however the log buffer would be discarded by the dprobes
>> event handler unless logonfault had been specified. This was implemented=
 to
>> support tracing and to suppress multiple trace entries for tracepoint th=
at
>> would temporarily fault, then subsequently retry execution successfully.
>>
>> When we removed the dprobes interpreter to a device drive and created a
>> minimal kernel API set (krpobes) to support the device driver, we needed=
 to
>> support this capability through the kprobes APIs. So what appears to have
>> happened is that the single-step fault and the handler fault have become
>> handler through the same mechanism. So the question is: is this
>> functionally correct?
>>
>> They reason why I doubt that it is correct is that any one using the
>> kprobes APIs is going to need a very thorough understanding of the
>> motivation behind this interface to use it correctly. I would propose
>> separating out the notification of single-stepping (original instruction
>> execution) exceptions from self-generated probe-handder exceptions.
>>
>> These are some possible options:
>>
>> 1)
>> leave the page-fault handler to be a called routine that's invoked inste=
ad
>> of the post-handler when single-step generates an exception.
>> create a setjump, longjump mechanism to allow the probe handler to catch
>> any self-generated exception but always protect the rest of the system f=
rom
>> any exception that is not intercepted.
>>
>> 2)
>>
>> leave the page-fault handler as it - a called routine, but don't invoke =
it
>> for single-step exceptions.
>> allow the page-fault handler to deal with creating or simulating a
>> setjump/longjump back to the original pre or post handler code.
>> pass a switch to the post-handler to indicate whether the single-stepping
>> resulted in a exception.
>>
>> 3) a combination of 1 & 2, (which I think I prefer)
>> dispense with the page-fault handler routine.
>> create a setjump/long mechanism for the pre- and post-handler routines to
>> use if they wish.
>> always shield any unhandled exception that has been generated by the pre-
>> and post- handlers from the rest of the system
>> indicate via a switch to the post-handler whether the singe-stepping
>> resulted in an exception (other than a debug exception of course :-))
>>
>>
>> Finally in response to how are things broken:
>>
>> Well I've just conducted an experiment on kprobes where I deliberately
>> generate a pagefault in the pre-handler and it appears that my pae-fault
>> handler is never called, be we do seem to be protecting the system from =
may
>> pagefault exception.
>>
>>
>> > (2) We are executing the probe handler, and an instruction in the probe
>> >     handler causes a page fault. This could, for example, happen if the
>> >     probe handler attempts to access a user-space address by issuing
>> >     copy_from_user or get_user etc, and that address is currently
>> >     paged out or invalid.
>> >
>> > The check KPROBE_HIT_SS enables us to distinguish between these two
>> cases.
>> >
>> > If set, it indicates that (1) must have occured. In which case, we want
>> > normal page fault handling to continue to that the data accessed by the
>> > instruction is paged in, and instruction execution continue after
>> > fault handling as would happen in normal execution. However, since
>> > this can cause a sleep, other probe hits etc, we turn off kprobes
>> > related state (hence the resume execution) before letting it continue.
>> Once
>> > the data is paged in, the page fault handler would resume execution fr=
om
>> > the original probed address and thus trap again on the breakpoint
>> instruction.
>> > So we encounter the probe handler a second time, and this time
>> single-stepping
>> > would succeed since the address accessed by the instruction is already
>> > in memory, and finally we would get a single-step trap and the post
>> handler
>> > would be executed.
>> >
>> > In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or at
>> > least were) running without interrupts, the default behaviour was to
>> > soft-fail the access, i.e. cause the copy_from_user/get_user call in
>> > the probe handler to fail with -EFAULT. How does this happen ? The
>> > in_atomic() check or its equivalent in do_page_fault succeeds and hence
>> > control jumps to bad_area_nosemaphore where it finds that the access
>> > happened in kernel mode and hence moves to no_context, where
>> > fixup_exception() takes care of the rest.
>> > More sophisticated or customised behaviour could be achieved by provid=
ing
>> > a ->fault_handler(), we could play some neat tricks (like Richard
>> mentioned)
>> > perform fixups etc.
>> >
>> > I hope that makes it a little clearer.
>> >
>> > Could you now help me understand what is broken now, if at all ?
>> >
>> > Regards
>> > Suparna
>> >
>> > On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote:
>> > > On Tue, 2006-02-07 at 09:51, Martin Hunt wrote:
>> > > > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote:
>> > > > [...]
>> > > > > > > And that's it. kprobe_fault_handler returns 0.  No call to
>> > > > > > > fixup_exceptions()!  So do_page_fault() will have to do
>> > the fixups, but
>> > > > > > > first it will print nasty might_sleep warnings and maybe
>> > actually sleep!
>> > > > >
>> > > > > You have the right idea of running fixup_exceptions() in the fau=
lt
>> > > > > handler, to prevent do_page_fault() from attempting its
>> demand-paging
>> > > > > stuff.  I just think it's OK to have the user's fault handler,
>> rather
>> > > > > than kprobe_fault_handler, run fixup_exceptions.
>> > > >
>> > > > I think if the user's fault handler does not handle the page fault,
>> > > > kprobes should attempt to do so. Returning without handling the pa=
ge
>> > > > fault could leave the system in a bad state. So your proposal would
>> mean
>> > > > it is necessary to specify a fault handler for any kprobe that mig=
ht
>> > > > attempt user-space access or otherwise trigger a page fault,
>> otherwise
>> > > > the system could crash.
>> > >
>> > > I just had a long chat with Richard Moore about this whole topic.  I
>> > > agree with you on this, and I think Richard would, too.
>> > >
>> > > So unless there's a user-specified handler and that handler specifies
>> > > (by returning 1) that it has handled the exception,
>> > > kprobe_fault_handler() should run fixup_exception(), right?
>> > >
>> > > Now I'm looking, later in that function, at the code (on i386) where=
 we
>> > > handle an exception while single-stepping.  I don't think
>> > > resume_execution() is the right thing to do here.  We haven't
>> > > successfully executed the probed instruction, and the eip still poin=
ts
>> > > at that instruction, right?  I think we're just hosed at this point.
>> > > Comments?
>> > >
>> > > >
>> > > > Any more importantly, fixup_exception() and all the exception table
>> code
>> > > > are currently not exported so cannot be called from a module.
>> > >
>> > > It's hard to argue with that. :-)
>> > >
>> > > >
>> > > > Martin
>> > >
>> > > Jim
>> > >
>> > >
>>
>
>--
>Prasanna S Panchamukhi
>Linux Technology Center
>India Software Labs, IBM Bangalore
>Email: prasanna@in.ibm.com
>Ph: 91-80-51776329