public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Return values for vm.pagefault.return changed with newer kernels
@ 2009-02-04 21:49 William Cohen
  2009-02-05 16:56 ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: William Cohen @ 2009-02-04 21:49 UTC (permalink / raw)
  To: SystemTAP

I am going through and redistributing the scripts in
testsuite/systemtap.samples. I run the scripts to make sure that the still
return reasonable data. I found that the the pfault.stp script was return only
VM_FAULT_OOM on the F10 machine (2.6.27.12-170.2.5.fc10.x86_64). This is due
some changes in the the page fault handler. The pfault.stp locally defines the
events:

global VM_FAULT_OOM, VM_FAULT_SIGBUS, VM_FAULT_MINOR, VM_FAULT_MAJOR
probe begin {
  VM_FAULT_OOM=-1
  VM_FAULT_SIGBUS=0
  VM_FAULT_MINOR=1
  VM_FAULT_MAJOR=2
}

This matches up with the v2.6.22 in kernel/include/linux/mm.h:

/*
 * Different kinds of faults, as returned by handle_mm_fault().
 * Used to decide whether a process gets delivered SIGBUS or
 * just gets major/minor fault counters bumped up.
 */
#define VM_FAULT_OOM    0x00
#define VM_FAULT_SIGBUS 0x01
#define VM_FAULT_MINOR  0x02
#define VM_FAULT_MAJOR  0x03

However, this doesn't work with the 2.6.23 and newer kernels. In v2.6.23 in
kernel/include/linux/mm.h things changed to a bit flag method:

/*
 * Different kinds of faults, as returned by handle_mm_fault().
 * Used to decide whether a process gets delivered SIGBUS or
 * just gets major/minor fault counters bumped up.
 */

#define VM_FAULT_MINOR  0 /* For backwards compat. Remove me quickly. */

#define VM_FAULT_OOM    0x0001
#define VM_FAULT_SIGBUS 0x0002
#define VM_FAULT_MAJOR  0x0004
#define VM_FAULT_WRITE  0x0008  /* Special case for get_user_pages */

#define VM_FAULT_NOPAGE 0x0100  /* ->fault installed the pte, not return page */
#define VM_FAULT_LOCKED 0x0200  /* ->fault locked the returned page */

#define VM_FAULT_ERROR  (VM_FAULT_OOM | VM_FAULT_SIGBUS)


Seems like the saner way to take care of this is to move the this information
into the tapsets/memory.stp However direct equality comparisons might not work
due to VM_FAULT_NOPAGE or VM_FAULT_LOCKED being bit or'ed in. Also notice that
the VM_FAULT_MINOR may be removed from future kernels. Would it make sense to
have a functions that test to see whether a fault is a particular kind of fault:

function vm_fault_minor(long:fault_no)
function vm_fault_major(long:fault_no)
function vm_fault_oom(long:fault_no)
function vm_fault_sigbus(long:fault_no)
function vm_fault_error(long:fault_no)

optionally:

function vm_fault_nopage(long:fault_no)
function vm_fault_locked(long:fault_no)

What do people think about including these functions in tapsets/memory.stp?

-Will

-Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Return values for vm.pagefault.return changed with newer kernels
  2009-02-04 21:49 Return values for vm.pagefault.return changed with newer kernels William Cohen
@ 2009-02-05 16:56 ` Frank Ch. Eigler
  2009-02-05 21:07   ` William Cohen
  2009-02-12 16:25   ` William Cohen
  0 siblings, 2 replies; 5+ messages in thread
From: Frank Ch. Eigler @ 2009-02-05 16:56 UTC (permalink / raw)
  To: William Cohen; +Cc: SystemTAP


William Cohen <wcohen@redhat.com> writes:

> [...]
> Seems like the saner way to take care of this is to move the this information
> into the tapsets/memory.stp [...]
>
> function vm_fault_minor(long:fault_no)
> function vm_fault_major(long:fault_no)
> function vm_fault_oom(long:fault_no)
> function vm_fault_sigbus(long:fault_no)
> function vm_fault_error(long:fault_no)

I guess we need to speculate about future uses and possible future
changes of this stuff.  It would make about as much sense to have
something smaller api-wise:

a single test function
   function vm_fault_class_p(fault_number:long, class:string)
and a variable that lists available classes
   global vm_fault_classes:string []

... or else to have code that converts the old style enums to new
style bit masks in some tapset variable.


- FChE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Return values for vm.pagefault.return changed with newer kernels
  2009-02-05 16:56 ` Frank Ch. Eigler
@ 2009-02-05 21:07   ` William Cohen
  2009-02-05 22:12     ` Frank Ch. Eigler
  2009-02-12 16:25   ` William Cohen
  1 sibling, 1 reply; 5+ messages in thread
From: William Cohen @ 2009-02-05 21:07 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: SystemTAP

Frank Ch. Eigler wrote:
> William Cohen <wcohen@redhat.com> writes:
> 
>> [...]
>> Seems like the saner way to take care of this is to move the this information
>> into the tapsets/memory.stp [...]
>>
>> function vm_fault_minor(long:fault_no)
>> function vm_fault_major(long:fault_no)
>> function vm_fault_oom(long:fault_no)
>> function vm_fault_sigbus(long:fault_no)
>> function vm_fault_error(long:fault_no)
> 
> I guess we need to speculate about future uses and possible future
> changes of this stuff.  It would make about as much sense to have
> something smaller api-wise:
> 
> a single test function
>    function vm_fault_class_p(fault_number:long, class:string)
> and a variable that lists available classes
>    global vm_fault_classes:string []
> 
> ... or else to have code that converts the old style enums to new
> style bit masks in some tapset variable.
> 
> 
> - FChE

I was thinking about an alternative that converted the fault number into a
string, similar to the errno_str function in the errno.stp.

function vm_fault_str(long:fault_no)

Multiple bits would be separated with |. Thus, vm_fault_str(0xc) would return
"VM_FAULT_MAJOR|VM_FAULT_WRITE"

-Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Return values for vm.pagefault.return changed with newer kernels
  2009-02-05 21:07   ` William Cohen
@ 2009-02-05 22:12     ` Frank Ch. Eigler
  0 siblings, 0 replies; 5+ messages in thread
From: Frank Ch. Eigler @ 2009-02-05 22:12 UTC (permalink / raw)
  To: William Cohen; +Cc: SystemTAP

Hi -

On Thu, Feb 05, 2009 at 04:01:50PM -0500, William Cohen wrote:
> [...]
> I was thinking about an alternative that converted the fault number into a
> string, similar to the errno_str function in the errno.stp. [...]

Yeah, but that would force clients to do substring matching if they
wished to accumulate per-fault-class statistics.

- FChE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Return values for vm.pagefault.return changed with newer kernels
  2009-02-05 16:56 ` Frank Ch. Eigler
  2009-02-05 21:07   ` William Cohen
@ 2009-02-12 16:25   ` William Cohen
  1 sibling, 0 replies; 5+ messages in thread
From: William Cohen @ 2009-02-12 16:25 UTC (permalink / raw)
  To: SystemTAP

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

Frank Ch. Eigler wrote:
> William Cohen <wcohen@redhat.com> writes:
> 
>> [...]
>> Seems like the saner way to take care of this is to move the this information
>> into the tapsets/memory.stp [...]
>>
>> function vm_fault_minor(long:fault_no)
>> function vm_fault_major(long:fault_no)
>> function vm_fault_oom(long:fault_no)
>> function vm_fault_sigbus(long:fault_no)
>> function vm_fault_error(long:fault_no)
> 
> I guess we need to speculate about future uses and possible future
> changes of this stuff.  It would make about as much sense to have
> something smaller api-wise:
> 
> a single test function
>    function vm_fault_class_p(fault_number:long, class:string)
> and a variable that lists available classes
>    global vm_fault_classes:string []
> 
> ... or else to have code that converts the old style enums to new
> style bit masks in some tapset variable.
> 
> 
> - FChE

I worked out a similar function (vm_fault_contains), but using the integer
VM_FAULT_MAJOR, VM_FAULT_MINOR, etc. into the memory.stp tapset.

vm_fault_contains:long (value:long, test:long)

The patch adds the function to tapset/memory.stp and include changes to
pfaults.stp to exercise the function. The attached patched has been verified to
work with current 2.6.27 kernel on Fedora 10 and older kernel on Red Hat
Enterprise Linux 5.

Any comments on it would be appreciated.

-Will

[-- Attachment #2: pfaults.patch --]
[-- Type: text/plain, Size: 3905 bytes --]

diff --git a/tapset/memory.stp b/tapset/memory.stp
index 2d7f8b0..827017e 100644
--- a/tapset/memory.stp
+++ b/tapset/memory.stp
@@ -6,6 +6,44 @@
 // redistribute it and/or modify it under the terms of the GNU General
 // Public License (GPL); either version 2, or (at your option) any
 // later version.
+%{
+#include <linux/mm.h>
+%}
+
+global VM_FAULT_OOM=0, VM_FAULT_SIGBUS=1, VM_FAULT_MINOR=2, VM_FAULT_MAJOR=3
+global VM_FAULT_NOPAGE=4, VM_FAULT_LOCKED=5, VM_FAULT_ERROR=6
+
+function vm_fault_contains:long (value:long, test:long)
+%{
+	int res;
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,23)
+	switch (THIS->test){
+	case 0: res = THIS->value == VM_FAULT_OOM; break;
+	case 1: res = THIS->value == VM_FAULT_SIGBUS; break;
+	case 2: res = THIS->value == VM_FAULT_MINOR; break;
+	case 3: res = THIS->value == VM_FAULT_MAJOR; break;
+	default:
+		res = 0; break;
+	}
+#else
+	switch (THIS->test){
+	case 0: res = THIS->value & VM_FAULT_OOM; break;
+	case 1: res = THIS->value & VM_FAULT_SIGBUS; break;
+	case 2: /* VM_FAULT_MINOR infered by that flags off */
+		res = !((VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_MAJOR) & 
+				 THIS->value);
+		 break;
+	case 3: res = THIS->value & VM_FAULT_MAJOR; break;
+	case 4: res = THIS->value & VM_FAULT_NOPAGE; break;
+	case 5: res = THIS->value & VM_FAULT_LOCKED; break;
+	case 6: res = THIS->value & VM_FAULT_ERROR; break;
+	default:
+		res = 0;
+	}
+#endif
+	THIS->__retvalue = (res != 0);
+	return;
+%}
 
 /**
  * probe vm.pagefault - Records that a page fault occurred.
diff --git a/testsuite/systemtap.samples/pfaults.stp b/testsuite/systemtap.samples/pfaults.stp
index 577e93c..2b02c3e 100644
--- a/testsuite/systemtap.samples/pfaults.stp
+++ b/testsuite/systemtap.samples/pfaults.stp
@@ -8,51 +8,38 @@ probe vm.pagefault {
    # its exported global variable.
    pidnames[pid()] = execname()
 
-   faults [pid(), $write_access ? 1 : 0] ++
+   faults [pid(), write_access ? 1 : 0] += 1
 }
 
 probe vm.pagefault.return {
-  fault_types [pid(), $return] ++
+  p = pid()
+  fault_types[p, VM_FAULT_MINOR] += vm_fault_contains(fault_type,VM_FAULT_MINOR)
+  fault_types[p, VM_FAULT_MAJOR] += vm_fault_contains(fault_type,VM_FAULT_MAJOR)
 }
 
-
-# Some constants, to come from a future "VM tapset"
-
-global VM_FAULT_OOM, VM_FAULT_SIGBUS, VM_FAULT_MINOR, VM_FAULT_MAJOR
-probe begin {
-  VM_FAULT_OOM=-1
-  VM_FAULT_SIGBUS=0
-  VM_FAULT_MINOR=1
-  VM_FAULT_MAJOR=2
-}
-
-
-# Shut down the probing session after a while
-probe timer.ms(1000) { report() }
-probe timer.ms(10000) { exit() }
-
-function _(n) { return sprint(n) } # let's abbreviate
+probe timer.s(1) { report() }
 
 function report () {
-  print ("time=" . _(gettimeofday_s()) . "\n")
+  printf("time=%d\n", gettimeofday_s())
+  printf("%16s[%7s] %7s %7s %7s %7s\n",
+  	"exec", "pid", "reads", "writes",
+	"minor", "major")
   foreach ([pid] in pidnames) {
     if (faults[pid,0]+faults[pid,1] == 0) continue
-    print (pidnames[pid] . "[" . _(pid) . "]" .
-           " reads=" . _(faults[pid,0]) . 
-           " writes=" . _(faults[pid,1]) .
-           " oom=" . _(fault_types[pid,VM_FAULT_OOM]) .
-           " sigbus=" . _(fault_types[pid,VM_FAULT_SIGBUS]) .
-           " minor=" . _(fault_types[pid,VM_FAULT_MINOR]) .
-           " major=" . _(fault_types[pid,VM_FAULT_MAJOR]) .
-           "\n")
-    }
+    printf("%16s[%7d] %7d %7d %7d %7d\n",
+           pidnames[pid], pid,
+           faults[pid,0], faults[pid,1],
+	   fault_types[pid,VM_FAULT_MINOR],
+	   fault_types[pid,VM_FAULT_MAJOR])
+  }
+  delete pidnames
   delete faults
   delete fault_types
 }
 
 probe begin {
-  print ("Page fault tracking, start time=" . _(gettimeofday_s()) . "\n")
+  printf("Page fault tracking, start time=%d\n", gettimeofday_s())
 }
 probe end {
-  print ("Page fault tracking, end time=" . _(gettimeofday_s()) . "\n")
+  printf("Page fault tracking, end time=%d\n", gettimeofday_s())
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-12 14:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-04 21:49 Return values for vm.pagefault.return changed with newer kernels William Cohen
2009-02-05 16:56 ` Frank Ch. Eigler
2009-02-05 21:07   ` William Cohen
2009-02-05 22:12     ` Frank Ch. Eigler
2009-02-12 16:25   ` William Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).