* RE: [SCRIPT] NUMA page fault accounting.
@ 2006-03-21 18:58 Stone, Joshua I
2006-03-21 20:37 ` Jose R. Santos
2006-03-22 15:57 ` Jose R. Santos
0 siblings, 2 replies; 4+ messages in thread
From: Stone, Joshua I @ 2006-03-21 18:58 UTC (permalink / raw)
To: jrs; +Cc: systemtap
Jose R. Santos wrote:
> page_faults [pid(), $write_access ? 1 : 0] ++
> node_faults [pid(), addr_to_node($address)] ++
You could improve scalability of this script by using statistics to
maintain your count, e.g.:
page_faults [pid(), $write_access ? 1 : 0] <<< 1
node_faults [pid(), addr_to_node($address)] <<< 1
And then access the values with @count(page_faults[...]).
Other than that, this looks good. It might be nice to start publishing
case studies on the website, so if you have a real problem that you
solved with this, please share!
Josh
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [SCRIPT] NUMA page fault accounting.
2006-03-21 18:58 [SCRIPT] NUMA page fault accounting Stone, Joshua I
@ 2006-03-21 20:37 ` Jose R. Santos
2006-03-22 15:57 ` Jose R. Santos
1 sibling, 0 replies; 4+ messages in thread
From: Jose R. Santos @ 2006-03-21 20:37 UTC (permalink / raw)
To: Stone, Joshua I; +Cc: systemtap
Stone, Joshua I wrote:
>Jose R. Santos wrote:
>> page_faults [pid(), $write_access ? 1 : 0] ++
>> node_faults [pid(), addr_to_node($address)] ++
>
>You could improve scalability of this script by using statistics to
>maintain your count, e.g.:
>
> page_faults [pid(), $write_access ? 1 : 0] <<< 1
> node_faults [pid(), addr_to_node($address)] <<< 1
>
>And then access the values with @count(page_faults[...]).
>
>
OK, I will play with this and send a revised script.
>Other than that, this looks good. It might be nice to start publishing
>case studies on the website, so if you have a real problem that you
>solved with this, please share!
>
>
>Josh
>
>
This script does not solve a particular problem, it is meant to narrow
down common issues that we have seen on some of our customer workloads.
I've been thinking of ways to use SystemTap in our performance area and
one of the possibilities that I'm currently working with is to come up
with small scripts which are design to narrow down performance related
problem. The inspiration behind this script is that we have had
multiple cases were a customer has brougt issues with performance of
their application on our servers which are sometime hard to narrow
down. NUMA related issues cause by bad compiler optimization,
non-optimal code design or Linux kernel issues have appeared more than
once. This is the first of what I hope will be many scripts that are
design with this purpose in mind.
I will also be working on a script which will be called why_idle.stp
which will be use to determine the reasons why a large system is not
able to run at 100% CPU capacity. This is another common problem that
we have seen here.
Thanks for the comments
-JRS
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [SCRIPT] NUMA page fault accounting.
2006-03-21 18:58 [SCRIPT] NUMA page fault accounting Stone, Joshua I
2006-03-21 20:37 ` Jose R. Santos
@ 2006-03-22 15:57 ` Jose R. Santos
1 sibling, 0 replies; 4+ messages in thread
From: Jose R. Santos @ 2006-03-22 15:57 UTC (permalink / raw)
To: Stone, Joshua I; +Cc: systemtap
Stone, Joshua I wrote:
>Jose R. Santos wrote:
>> page_faults [pid(), $write_access ? 1 : 0] ++
>> node_faults [pid(), addr_to_node($address)] ++
>
>You could improve scalability of this script by using statistics to
>maintain your count, e.g.:
>
> page_faults [pid(), $write_access ? 1 : 0] <<< 1
> node_faults [pid(), addr_to_node($address)] <<< 1
>
>And then access the values with @count(page_faults[...]).
>
>Other than that, this looks good. It might be nice to start publishing
>case studies on the website, so if you have a real problem that you
>solved with this, please share!
>
>
>Josh
>
>
Here the updated script based on your suggestions. Unfortunately, the
machine that I uses is an old pre-production box that gives me a lot of
variability between runs even when Im not running the SystemTap script
so I cant verify if the changes are faster running the stream benchmark.
Thanks
-JRS
#! /usr/local/bin/stap -g
global execnames, page_faults, node_faults
function addr_to_node:long(addr:long)
%{
int nid;
int pfn = __pa(THIS->addr) >> PAGE_SHIFT;
for_each_online_node(nid)
if ( node_start_pfn(nid) <= pfn &&
pfn < (node_start_pfn(nid) +
NODE_DATA(nid)->node_spanned_pages) )
{
THIS->__retvalue = nid;
break;
}
%}
probe kernel.function("__handle_mm_fault") {
execnames[pid()] = execname()
page_faults [pid(), $write_access ? 1 : 0] <<< 1
node_faults [pid(), addr_to_node($address)] <<< 1
}
function print_pf () {
print (" Execname\t PID\tRead Faults\tWrite
Faults\n")
print
("====================\t========\t===========\t============\n")
foreach (pid in execnames) {
printf ("%20s\t%8d\t%11d\t%12d\t", execnames[pid], pid,
@count(page_faults[pid,0]),
@count(page_faults[pid,1]))
foreach ([pid2,node+] in node_faults) {
if (pid2 == pid)
printf ("Node[%d]=%d\t", node,
@count(node_faults[pid2, node]))
}
print ("\n")
}
}
probe begin {
print ("Starting pagefault counters \n")
}
probe end {
print ("Printing counters: \n")
print_pf ()
print ("Done\n")
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [SCRIPT] NUMA page fault accounting.
@ 2006-03-21 16:44 Jose R. Santos
0 siblings, 0 replies; 4+ messages in thread
From: Jose R. Santos @ 2006-03-21 16:44 UTC (permalink / raw)
To: systemtap
Hi folks,
My teams has had the need to do analysis of page faults that happen on a
per NUMA node basis and I've come up with this simple SystemTap script
that does this for me. I only have access to a PPC64 NUMA box but I
took care to use generic arch independed NUMA code and I think it should
work for any kernel that has NUMA enable. Here a short output of
running the stream benchmark with a 4.5GB array on a 8GB system.
Starting pagefault counters
Printing counters:
Execname PID Read Faults Write Faults
==================== ======== =========== ============
stream 22786 33 668186
Node[0]=521176 Node[1]=147043
stpd 22803 1 3
Node[0]=3 Node[1]=1
and the script:
#! stap -g
global execnames, page_faults, node_faults
function addr_to_node:long(addr:long)
%{
int nid;
int pfn = __pa(THIS->addr) >> PAGE_SHIFT;
for_each_online_node(nid)
if ( node_start_pfn(nid) <= pfn &&
pfn < (node_start_pfn(nid) +
NODE_DATA(nid)->node_spanned_pages) )
{
THIS->__retvalue = nid;
break;
}
%}
probe kernel.function("__handle_mm_fault") {
execnames[pid()] = execname()
page_faults [pid(), $write_access ? 1 : 0] ++
node_faults [pid(), addr_to_node($address)] ++
}
function print_pf () {
print (" Execname\t PID\tRead Faults\tWrite
Faults\n")
print
("====================\t========\t===========\t============\n")
foreach (pid in execnames) {
printf ("%20s\t%8d\t%11d\t%12d\t", execnames[pid], pid,
page_faults[pid,0], page_faults[pid,1])
foreach ([pid2,node+] in node_faults) {
if (pid2 == pid)
printf ("Node[%d]=%d\t", node,
node_faults[pid2, node])
}
print ("\n")
}
}
probe begin {
print ("Starting pagefault counters \n")
}
probe end {
print ("Printing counters: \n")
print_pf ()
print ("Done\n")
}
Comments welcome - Enjoy
-JRS
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-03-22 15:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-21 18:58 [SCRIPT] NUMA page fault accounting Stone, Joshua I
2006-03-21 20:37 ` Jose R. Santos
2006-03-22 15:57 ` Jose R. Santos
-- strict thread matches above, loose matches on Subject: below --
2006-03-21 16:44 Jose R. Santos
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).