From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22824 invoked by alias); 9 Mar 2006 16:00:01 -0000 Received: (qmail 22762 invoked by uid 22791); 9 Mar 2006 15:59:59 -0000 X-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL,BAYES_50,DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mtagate1.uk.ibm.com (HELO mtagate1.uk.ibm.com) (195.212.29.134) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 09 Mar 2006 15:59:58 +0000 Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185]) by mtagate1.uk.ibm.com (8.12.10/8.12.10) with ESMTP id k29Fxtnk253846 for ; Thu, 9 Mar 2006 15:59:55 GMT Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) by d06nrmr1407.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.8) with ESMTP id k29G0BGu213448 for ; Thu, 9 Mar 2006 16:00:11 GMT Received: from d06av04.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av04.portsmouth.uk.ibm.com (8.12.11/8.13.3) with ESMTP id k29FxsUn015311 for ; Thu, 9 Mar 2006 15:59:54 GMT Received: from d06ml065.portsmouth.uk.ibm.com (d06ml065.portsmouth.uk.ibm.com [9.149.38.138]) by d06av04.portsmouth.uk.ibm.com (8.12.11/8.12.11) with ESMTP id k29FxsoB015303 for ; Thu, 9 Mar 2006 15:59:54 GMT Sensitivity: Subject: thoughts about exception-handling requirements for kprobes To: systemtap@sources.redhat.com X-Mailer: Lotus Notes Release 6.5.1IBM February 19, 2004 Message-ID: From: Richard J Moore Date: Thu, 09 Mar 2006 16:00:00 -0000 X-MIMETrack: Serialize by Router on D06ML065/06/M/IBM(Release 6.53HF247 | January 6, 2005) at 09/03/2006 16:00:11 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2006-q1/txt/msg00745.txt.bz2 I've been thinking about the need for exception-handling and how the current implementation has become a little muddled. These are the categories we need to think about: 1) Expected exception, user pre-handler. here the pre-handler code knows it's going to do something dodgy and doesn't want to cause a big fuss. If the memory it accesses causes a fault then the pre-handler continues with something else. Where could this occur: on gathering data for a trace record from various pageable locations. Under such a circumstance one would not want to have the pre-handler cancelled (i.e. terminated prematurely) because one or more items were not available. 2) Unexpected exception, user pre-handler. here the pre-handler has either a bug or is debugging a badly damaged environment. Let's not forget that the latter environment is very import to cater for as best we can. The response should at the very least be to quietly cancel the pre-handler. It's also conceivable that one might want to intercept this to free off any locks and put out an explanatory message. 3) exception of probed instruction during single-step. kprobes needs to know this has happened so that the usual clean-up can be done following single-step. In addition kprobes need to fix-up certain processor flags appropriately if the exception was on a flag altering instruction. The user post-handler generally doesn't want to be called since that results in duplicated trace records, one for each retry of the probed instruction if retry is attempted. There are two cases where the user post handler doesn't want to know about the single-step generated an unexpected exception. a) where the exception on the probed instruction in not retryable - e.g. it traps rather than faults. In this case we will never get the chance to record information about that event. b) where we are interested specifically in the number of times an instruction is retried. 4) Expected exception, user post-handler. same consideration as pre-handler. 5) unexpected exception, user post-handler same consideration as pre-handler. It is possible that systemtap might not yet want to exploit all of these 5 categories. However, I can see circumstances where natively written probe handlers would. And I can see that systemtap might well want to do this is due course. Cases 1 & 4 The expected exceptions in user pre- and post-handlers are easy to deal with in a way that performs well: this is via a setjmp/longjmp mechanism. setjmp is trivial to implement; examples can be seen in kgdb and xmon and arguably setjmp ought to be a common kernel routine for certain categories of use. So a user pre-hander might do something like this: if (setjmp(jbuf) == 0) { do something dodgy; } else printf("yes, that was dodgy"); >From a fault-handler the user would do: if (setjmp-buf-set-up) { longjmp(buf,1); } However, it seems pointless to call the pagefault handler just to have it execute the longjmp. Also the page-fault handler needs to be able to determine unequivocally that the fault it was entered for was fenced by a setjmp. It would seem better to have kprobes put a wrapper around setjmp (ksetjmp) and issue the longjmp directly. kprobes could also maintain a maxfault counter for the probe so that we can exit recursive fault situations. setjmp is truly trivial to implement and an exceedingly low overhead. I can't see a good reason for not implementing this. Could we exploit if from systemtap? How about using a try-catch semantic: try { do some dodgy stuff; } catch { phew(); } next(); . . Cases 2 & 5: Here we need to allow the user-probe to clean up from unexpected exceptions. The page-fault handler as currently specified would seem to be ideal. There would be no need to surface this at the systemtap scripting level. Case 3: I understand the many users don't have a need for this so the implementation should not impact those users. In this case I suggest that either a) a duplicate post-handler hook be allowed by kprobes that would only be called if the user registers such a handler and the single-step results in an exception. b) we allow the user to specify that the post-handler should be called for both the exception and non-exception cases then have the call to the post handler indicate which has happened. In either case (a or b), on return from the user handler, kprobes would do the normal back-end probepoint handling. Since systemtap currently hides pre- and post-handling from the scripting level, it seems unlikely that this would be exploited without expanding the specification of a probepoint in some messy way (e.g.probe pre syscall.open vs probe post syscall.open vs probe fault syscall.open). But it might be nice to see if there's a way to support case 3a implicitly. I am strongly of the opinion that we shouldn't handle case 3 though the page-fault handler as we attempted to in earlier versions of kprobes - there are just too many diverse potential reasons for the page-fault handler to be called. - - Richard J Moore IBM Advanced Linux Response Team - Linux Technology Centre MOBEX: 264807; Mobile (+44) (0)7739-875237 Office: (+44) (0)1962-817072