From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18750 invoked by alias); 9 Feb 2006 21:35:45 -0000 Received: (qmail 18743 invoked by uid 22791); 9 Feb 2006 21:35:44 -0000 X-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: sourceware.org Received: from e1.ny.us.ibm.com (HELO e1.ny.us.ibm.com) (32.97.182.141) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 09 Feb 2006 21:35:43 +0000 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id k19LZepN019182 for ; Thu, 9 Feb 2006 16:35:40 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id k19LZf3n224442 for ; Thu, 9 Feb 2006 16:35:41 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id k19LZeGO025037 for ; Thu, 9 Feb 2006 16:35:40 -0500 Received: from dyn9047018079.beaverton.ibm.com (dyn9047018079.beaverton.ibm.com [9.47.18.79]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id k19LZdWR024979; Thu, 9 Feb 2006 16:35:39 -0500 Subject: Re: kprobe fault handling From: Jim Keniston To: prasanna@in.ibm.com Cc: Richard J Moore , suparna@in.ibm.com, Martin Hunt , SystemTAP In-Reply-To: <20060209072338.GA2902@in.ibm.com> References: <20060208043831.GA9313@in.ibm.com> <20060209072338.GA2902@in.ibm.com> Content-Type: text/plain Organization: Message-Id: <1139520938.2779.171.camel@dyn9047018079.beaverton.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: Thu, 09 Feb 2006 21:35:00 -0000 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2006-q1/txt/msg00462.txt.bz2 On Wed, 2006-02-08 at 23:23, Prasanna S Panchamukhi wrote: > Hi, > > Please find the patch below to provide robust kprobe fault handling. > Presently this patch is for i386 architecture, will port to other > architectures if this approach looks ok. This patch is currently > untested, appreciate any feedback. ... > - if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr)) > - return 1; > - I think it's right to move the above call, but if you do, your patch should update Documentation/kprobes.txt, which states: "If a fault occurs during execution of kp->pre_handler or kp->post_handler, >>> or during single-stepping of the probed instruction, <<< Kprobes calls kp->fault_handler." > if (kcb->kprobe_status & KPROBE_HIT_SS) { > + /* > + * We are here because the instruction being single stepped > + * caused a page fault. We reset the current kprobe and the > + * eip points back to the probe address and allow the page > + * fault handler. > + */ > resume_execution(cur, regs, kcb); I agree with Anil that resume_execution() can make adjustments we don't want (since the probed instruction hasn't completed). Seems like what we want is just regs->eip = (unsigned long) cur->addr; > regs->eflags |= kcb->kprobe_old_eflags; > > reset_current_kprobe(); > preempt_enable_no_resched(); Bibo said: > When single stepped instruction cause page fault, eip will point to > kprobe copied address, but not probed address, so execution table > search will fail. I don't think finding the fixup will be a problem, because we restore regs->eip to the address of the actual probed instruction before returning to do_page_fault(). > + } else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) || > + (kcb-kprobe_status & KPROBE_HIT_ACTIVE)) { > + /* > + * We come here because instructions in the pre/post handler > + * caused the page_fault, this could happen if handler tries > + * to access user space by copy_from_user(), get_user() etc. > + * Let the user-specified handler try to fix it first. > + */ > + if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr)) > + return 1; > + > + /* > + * In case the user-specified fault handler returned zero, > + * try to fix up. > + */ > + > + if (fixup_exception(regs)) > + return 1; I think it's OK to call fixup_exceptions() here, but I believe it's redundant. I understood Suparna to say (http://sourceware.org/ml/systemtap/2006-q1/msg00423.html) that if we return 0, do_page_fault() will call fixup_exceptions() instead of trying to bring in the missing page (since it's a kernel instruction -- in a handler -- that faulted). Her explanation made sense to me. We still don't handle the case of an erroneous pre/post-handler that incurs a fault with no fixup. (We'd like to do something other than let the kernel crash.) I think we'd have to do a sort of setjmp before calling the handler in order to do that. Disclaimer: Some or all of this is probably wrong (again). :-} Jim