From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2895 invoked by alias); 9 Feb 2006 08:55:12 -0000 Received: (qmail 2872 invoked by uid 22791); 9 Feb 2006 08:55:11 -0000 X-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from fmr18.intel.com (HELO orsfmr003.jf.intel.com) (134.134.136.17) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 09 Feb 2006 08:55:07 +0000 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id k198su8o021615; Thu, 9 Feb 2006 08:54:56 GMT Received: from pdsmsxvs01.pd.intel.com (pdsmsxvs01.pd.intel.com [172.16.12.122]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id k198srIn019805; Thu, 9 Feb 2006 08:54:53 GMT Received: from pdsmsx331.ccr.corp.intel.com ([172.16.12.58]) by pdsmsxvs01.pd.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2006020916545114862 ; Thu, 09 Feb 2006 16:54:51 +0800 Received: from pdsmsx405.ccr.corp.intel.com ([172.16.12.95]) by pdsmsx331.ccr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 Feb 2006 16:54:51 +0800 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Subject: RE: kprobe fault handling Date: Thu, 09 Feb 2006 08:55:00 -0000 Message-ID: <9FBCE015AF479F46B3B410499F3AE05B0898D3@pdsmsx405> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: kprobe fault handling Thread-Index: AcYtSf8KZ6gDcRzeS3CZKihx77+zvwACOCBQ From: "Mao, Bibo" To: , "Richard J Moore" Cc: , "Martin Hunt" , "Jim Keniston" , "SystemTAP" X-OriginalArrivalTime: 09 Feb 2006 08:54:51.0953 (UTC) FILETIME=[7D50EA10:01C62D56] X-Scanned-By: MIMEDefang 2.52 on 10.7.209.17 X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2006-q1/txt/msg00450.txt.bz2 When instruction being single stepped causes page fault, resetting current = kprobe will cause pre_handle function to execute twice. And I think main ai= m of set eip point to probed ip is for fixup_exception(regs) in do_page_fau= lt function. If probed instruction try to access memory not in VMA table, i= t will call fixup_exception(regs) to fix it, it will search exception table= and set eip to fixup instruction address. When single stepped instruction cause page fault, eip will point to kprobe = copied address, but not probed address, so execution table search will fail. And I do not whether there is good way to avoid kprobe execution twice and = jump to fixup code when single stepped instruction cause page pault. bibo,mao >-----Original Message----- >From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.or= g] >On Behalf Of Prasanna S Panchamukhi >Sent: 2006=C4=EA2=D4=C29=C8=D5 15:24 >To: Richard J Moore >Cc: suparna@in.ibm.com; Martin Hunt; Jim Keniston; SystemTAP >Subject: Re: kprobe fault handling > >Hi, > >Please find the patch below to provide robust kprobe fault handling. >Presently this patch is for i386 architecture, will port to other >architectures if this approach looks ok. This patch is currently >untested, appreciate any feedback. > >Thanks >Prasanna > >This patch provides proper kprobes fault handling, if a user specified pre= /post >handlers tires to access user address space, through copy_from_user(), >get_user() etc. The user-specified handler gets called only if the fault >occurs due to kprobes handlers and if the user-specified handler does >not fix it, we try to fix it by calling fix_exception(). >The user specified handler will not be called if the fault happens when >single stepping the original instruction. > >Signed-off-by: Prasanna S Panchamukhi > > > arch/i386/kernel/kprobes.c | 32 +++++++++++++++++++++++++++++--- > 1 files changed, 29 insertions(+), 3 deletions(-) > >diff -puN arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix >arch/i386/kernel/kprobes.c >--- >linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes-fault-handling-fix > 2006-02-09 12:50:44.000000000 +0530 >+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c 2006-02-09 >12:50:44.000000000 +0530 >@@ -543,15 +543,41 @@ static inline int kprobe_fault_handler(s > struct kprobe *cur =3D kprobe_running(); > struct kprobe_ctlblk *kcb =3D get_kprobe_ctlblk(); > >- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr)) >- return 1; >- > if (kcb->kprobe_status & KPROBE_HIT_SS) { >+ /* >+ * We are here because the instruction being single stepped >+ * caused a page fault. We reset the current kprobe and the >+ * eip points back to the probe address and allow the page >+ * fault handler. >+ */ > resume_execution(cur, regs, kcb); > regs->eflags |=3D kcb->kprobe_old_eflags; > > reset_current_kprobe(); > preempt_enable_no_resched(); >+ } else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) || >+ (kcb-kprobe_status & KPROBE_HIT_ACTIVE)) { >+ /* >+ * We come here because instructions in the pre/post handler >+ * caused the page_fault, this could happen if handler tries >+ * to access user space by copy_from_user(), get_user() etc. >+ * Let the user-specified handler try to fix it first. >+ */ >+ if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr)) >+ return 1; >+ >+ /* >+ * In case the user-specified fault handler returned zero, >+ * try to fix up. >+ */ >+ >+ if (fixup_exception(regs)) >+ return 1; >+ >+ /* >+ * fixup_exception() could not handle it, >+ * Let do_page_fault() fix it. >+ */ > } > return 0; > } > >_ > >On Wed, Feb 08, 2006 at 11:27:53AM +0000, Richard J Moore wrote: >> >> >> >> >> >> systemtap-owner@sourceware.org wrote on 08/02/2006 04:38:31: >> >> > >> > Recalling the way this was supposed to work ... AFAICR >> > >> > There are 2 main situations where one could land up into >> > the page fault handler during kprobe processing: >> > >> > (1) We have completed the probe handler and are single-stepping the >> probed >> > instruction. The original instruction being single-stepped causes a >> > page fault. This would typically happen if that instruction issues= a >> > memory access that is paged out. >> > >> >> Yes you are correct, the code does do this, but its not supposed to. The >> pagefault handler was intended solely to protect the system from the >> probe-handler, not to catch faults in the original instruction. >> >> I think what's happened is that two needs have become confused. Going b= ack >> to the original dprobes implementation: >> >> The pagefault callback was there to allow the probe-handler to continue >> even if it touched inaccessible memory. >> The dprobes interpreter always hooked faults and protected the system fr= om >> the probe handler. >> On single-stepping the original instruction any exception was percolated= up >> to the systems, however the log buffer would be discarded by the dprobes >> event handler unless logonfault had been specified. This was implemented= to >> support tracing and to suppress multiple trace entries for tracepoint th= at >> would temporarily fault, then subsequently retry execution successfully. >> >> When we removed the dprobes interpreter to a device drive and created a >> minimal kernel API set (krpobes) to support the device driver, we needed= to >> support this capability through the kprobes APIs. So what appears to have >> happened is that the single-step fault and the handler fault have become >> handler through the same mechanism. So the question is: is this >> functionally correct? >> >> They reason why I doubt that it is correct is that any one using the >> kprobes APIs is going to need a very thorough understanding of the >> motivation behind this interface to use it correctly. I would propose >> separating out the notification of single-stepping (original instruction >> execution) exceptions from self-generated probe-handder exceptions. >> >> These are some possible options: >> >> 1) >> leave the page-fault handler to be a called routine that's invoked inste= ad >> of the post-handler when single-step generates an exception. >> create a setjump, longjump mechanism to allow the probe handler to catch >> any self-generated exception but always protect the rest of the system f= rom >> any exception that is not intercepted. >> >> 2) >> >> leave the page-fault handler as it - a called routine, but don't invoke = it >> for single-step exceptions. >> allow the page-fault handler to deal with creating or simulating a >> setjump/longjump back to the original pre or post handler code. >> pass a switch to the post-handler to indicate whether the single-stepping >> resulted in a exception. >> >> 3) a combination of 1 & 2, (which I think I prefer) >> dispense with the page-fault handler routine. >> create a setjump/long mechanism for the pre- and post-handler routines to >> use if they wish. >> always shield any unhandled exception that has been generated by the pre- >> and post- handlers from the rest of the system >> indicate via a switch to the post-handler whether the singe-stepping >> resulted in an exception (other than a debug exception of course :-)) >> >> >> Finally in response to how are things broken: >> >> Well I've just conducted an experiment on kprobes where I deliberately >> generate a pagefault in the pre-handler and it appears that my pae-fault >> handler is never called, be we do seem to be protecting the system from = may >> pagefault exception. >> >> >> > (2) We are executing the probe handler, and an instruction in the probe >> > handler causes a page fault. This could, for example, happen if the >> > probe handler attempts to access a user-space address by issuing >> > copy_from_user or get_user etc, and that address is currently >> > paged out or invalid. >> > >> > The check KPROBE_HIT_SS enables us to distinguish between these two >> cases. >> > >> > If set, it indicates that (1) must have occured. In which case, we want >> > normal page fault handling to continue to that the data accessed by the >> > instruction is paged in, and instruction execution continue after >> > fault handling as would happen in normal execution. However, since >> > this can cause a sleep, other probe hits etc, we turn off kprobes >> > related state (hence the resume execution) before letting it continue. >> Once >> > the data is paged in, the page fault handler would resume execution fr= om >> > the original probed address and thus trap again on the breakpoint >> instruction. >> > So we encounter the probe handler a second time, and this time >> single-stepping >> > would succeed since the address accessed by the instruction is already >> > in memory, and finally we would get a single-step trap and the post >> handler >> > would be executed. >> > >> > In KPROBE_HIT_SS is not set, we enter case (2). Now since we are (or at >> > least were) running without interrupts, the default behaviour was to >> > soft-fail the access, i.e. cause the copy_from_user/get_user call in >> > the probe handler to fail with -EFAULT. How does this happen ? The >> > in_atomic() check or its equivalent in do_page_fault succeeds and hence >> > control jumps to bad_area_nosemaphore where it finds that the access >> > happened in kernel mode and hence moves to no_context, where >> > fixup_exception() takes care of the rest. >> > More sophisticated or customised behaviour could be achieved by provid= ing >> > a ->fault_handler(), we could play some neat tricks (like Richard >> mentioned) >> > perform fixups etc. >> > >> > I hope that makes it a little clearer. >> > >> > Could you now help me understand what is broken now, if at all ? >> > >> > Regards >> > Suparna >> > >> > On Tue, Feb 07, 2006 at 11:49:22AM -0800, Jim Keniston wrote: >> > > On Tue, 2006-02-07 at 09:51, Martin Hunt wrote: >> > > > On Tue, 2006-02-07 at 09:31 -0800, Jim Keniston wrote: >> > > > [...] >> > > > > > > And that's it. kprobe_fault_handler returns 0. No call to >> > > > > > > fixup_exceptions()! So do_page_fault() will have to do >> > the fixups, but >> > > > > > > first it will print nasty might_sleep warnings and maybe >> > actually sleep! >> > > > > >> > > > > You have the right idea of running fixup_exceptions() in the fau= lt >> > > > > handler, to prevent do_page_fault() from attempting its >> demand-paging >> > > > > stuff. I just think it's OK to have the user's fault handler, >> rather >> > > > > than kprobe_fault_handler, run fixup_exceptions. >> > > > >> > > > I think if the user's fault handler does not handle the page fault, >> > > > kprobes should attempt to do so. Returning without handling the pa= ge >> > > > fault could leave the system in a bad state. So your proposal would >> mean >> > > > it is necessary to specify a fault handler for any kprobe that mig= ht >> > > > attempt user-space access or otherwise trigger a page fault, >> otherwise >> > > > the system could crash. >> > > >> > > I just had a long chat with Richard Moore about this whole topic. I >> > > agree with you on this, and I think Richard would, too. >> > > >> > > So unless there's a user-specified handler and that handler specifies >> > > (by returning 1) that it has handled the exception, >> > > kprobe_fault_handler() should run fixup_exception(), right? >> > > >> > > Now I'm looking, later in that function, at the code (on i386) where= we >> > > handle an exception while single-stepping. I don't think >> > > resume_execution() is the right thing to do here. We haven't >> > > successfully executed the probed instruction, and the eip still poin= ts >> > > at that instruction, right? I think we're just hosed at this point. >> > > Comments? >> > > >> > > > >> > > > Any more importantly, fixup_exception() and all the exception table >> code >> > > > are currently not exported so cannot be called from a module. >> > > >> > > It's hard to argue with that. :-) >> > > >> > > > >> > > > Martin >> > > >> > > Jim >> > > >> > > >> > >-- >Prasanna S Panchamukhi >Linux Technology Center >India Software Labs, IBM Bangalore >Email: prasanna@in.ibm.com >Ph: 91-80-51776329