From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-2751-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 22824 invoked by alias); 9 Mar 2006 16:00:01 -0000
Received: (qmail 22762 invoked by uid 22791); 9 Mar 2006 15:59:59 -0000
X-Spam-Status: No, hits=-0.8 required=5.0 	tests=AWL,BAYES_50,DNS_FROM_RFC_ABUSE,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mtagate1.uk.ibm.com (HELO mtagate1.uk.ibm.com) (195.212.29.134)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 09 Mar 2006 15:59:58 +0000
Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185]) 	by mtagate1.uk.ibm.com (8.12.10/8.12.10) with ESMTP id k29Fxtnk253846 	for <systemtap@sources.redhat.com>; Thu, 9 Mar 2006 15:59:55 GMT
Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) 	by d06nrmr1407.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.8) with ESMTP id k29G0BGu213448 	for <systemtap@sources.redhat.com>; Thu, 9 Mar 2006 16:00:11 GMT
Received: from d06av04.portsmouth.uk.ibm.com (loopback [127.0.0.1]) 	by d06av04.portsmouth.uk.ibm.com (8.12.11/8.13.3) with ESMTP id k29FxsUn015311 	for <systemtap@sources.redhat.com>; Thu, 9 Mar 2006 15:59:54 GMT
Received: from d06ml065.portsmouth.uk.ibm.com (d06ml065.portsmouth.uk.ibm.com [9.149.38.138]) 	by d06av04.portsmouth.uk.ibm.com (8.12.11/8.12.11) with ESMTP id k29FxsoB015303 	for <systemtap@sources.redhat.com>; Thu, 9 Mar 2006 15:59:54 GMT
Sensitivity:
Subject: thoughts about exception-handling requirements for kprobes
To: systemtap@sources.redhat.com
X-Mailer: Lotus Notes Release 6.5.1IBM February 19, 2004
Message-ID: <OFFFD05E79.21688AA2-ON8025712C.0052DE26-8025712C.0057A4F8@uk.ibm.com>
From: Richard J Moore <richardj_moore@uk.ibm.com>
Date: Thu, 09 Mar 2006 16:00:00 -0000
X-MIMETrack: Serialize by Router on D06ML065/06/M/IBM(Release 6.53HF247 | January 6, 2005) at  09/03/2006 16:00:11
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2006-q1/txt/msg00745.txt.bz2


I've been thinking about the need for exception-handling and how the
current implementation has become a little muddled.

These are the categories we need to think about:


1) Expected exception, user pre-handler.
here the pre-handler code knows it's going to do something dodgy and
doesn't want to cause a  big fuss. If the memory it accesses causes a fault
then the pre-handler continues with something else.
Where could this occur: on  gathering data for a trace record from various
pageable locations. Under such a circumstance one would not want to have
the pre-handler cancelled (i.e. terminated prematurely)  because one or
more items were not available.

2) Unexpected exception, user pre-handler.

here the pre-handler has either a bug or is debugging a badly damaged
environment. Let's not forget that the latter environment is very import to
cater for as best we can.
The response should at the very least be to quietly cancel the pre-handler.
It's also conceivable that one might want to intercept this to free off any
locks and put out an explanatory message.

3) exception of probed instruction during single-step.

kprobes needs to know this has happened so that the usual clean-up can be
done following single-step. In addition kprobes need to fix-up certain
processor flags appropriately if the exception was on a flag altering
instruction. The user post-handler generally doesn't want to be called
since that results in duplicated trace records, one for each retry of the
probed instruction if retry is attempted. There are two cases where the
user post handler doesn't want to know about the single-step  generated an
unexpected exception.

a) where the exception on the probed instruction in not retryable - e.g. it
traps rather than faults. In this case we will never get the chance to
record information about that event.
b) where we are interested specifically in the number of times an
instruction is retried.

4) Expected exception, user post-handler.
same consideration as pre-handler.

5) unexpected exception, user post-handler
same consideration as pre-handler.


It is possible that systemtap might not yet want to exploit all of these 5
categories. However, I can see circumstances where natively written probe
handlers would. And I can see that systemtap might well want to do this is
due course.


Cases 1 & 4
The expected exceptions in user pre- and post-handlers are easy to deal
with in a way that performs well: this is via a setjmp/longjmp mechanism.
setjmp is trivial to implement; examples can be seen in kgdb and xmon and
arguably setjmp ought to be a common kernel routine for certain categories
of use.

So a user pre-hander might do something like this:


if (setjmp(jbuf) == 0) {

   do something dodgy;

}
else printf("yes, that was dodgy");


>From a fault-handler the user would do:

if (setjmp-buf-set-up) {
   longjmp(buf,1);
}


However, it seems pointless to call the pagefault handler just to have it
execute the longjmp. Also the page-fault handler needs to be able to
determine unequivocally that the fault it was entered for was fenced by a
setjmp.

It would seem better to have kprobes put a wrapper around setjmp (ksetjmp)
and issue the longjmp directly. kprobes could also maintain a maxfault
counter for the probe so that we can exit recursive fault situations.

setjmp is truly trivial to implement and an exceedingly low overhead. I
can't see a good reason for not implementing this.  Could we exploit if
from systemtap?
How about using a try-catch semantic:

try {

     do some dodgy stuff;
}
catch {

     phew();
}

next();
.
.

Cases 2 & 5:

Here we need to allow the user-probe to clean up from unexpected
exceptions. The page-fault handler as currently specified would seem to be
ideal. There would be no need to surface this at the systemtap scripting
level.


Case 3:

I understand the many users don't have a need for this so the
implementation should not impact those users. In this case I suggest that
either
a) a duplicate post-handler hook be allowed by kprobes that would only be
called if the user registers such a handler and the single-step results in
an exception.

b) we allow the user to specify that the post-handler should be called for
both the exception and non-exception cases then have the call to the post
handler indicate which has happened.

In either case (a or b), on return from the user handler, kprobes would do
the normal back-end probepoint handling.


Since systemtap currently hides pre- and post-handling from the scripting
level, it seems unlikely that this would be exploited without expanding the
specification of a probepoint in some messy way (e.g.probe pre syscall.open
vs probe post syscall.open vs probe fault syscall.open).  But it might be
nice to see if there's a way to support case 3a implicitly. I am strongly
of the opinion that we shouldn't handle case 3 though the page-fault
handler as we attempted to in earlier versions of kprobes - there are just
too many diverse potential reasons for the page-fault handler to be called.


- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072