Re: Better pagecache statistics ?

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Re: Better pagecache statistics ?
       [not found]                 ` <1133567206.21429.117.camel@localhost.localdomain>
@ 2005-12-03  1:59                   ` Frank Ch. Eigler
  0 siblings, 0 replies; 4+ messages in thread
From: Frank Ch. Eigler @ 2005-12-03  1:59 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: systemtap

Hi -

(Redirected from http://lkml.org/lkml/2005/12/1/182)

Badari Pulavarty wrote:

> [...]  Is there a way another user-level program/utility access some
> of the data maintained in those arrays ?

Not really.  One possibility is an on-demand /proc interface outlined
in systemtap bug #1154.

> [...]
> Does this mean that I can do something like
> 	page_cache[0xffff8100c4c6b298] = $mapping->nrpages ?
> And this won't generate bloated arrays ?

If by "bloat" you mean "trying to allocate 2**64 elements", then no,
it won't do that.  Systemtap associative arrays are more like hash
tables.

> [...]  Unfortunately, I can't capture whatever happend before
> inserting the problem. So it won't give me information about all
> whats there in the pagecache.

Until other mechanisms become available, one could perhaps start the
probe early on during boot.

> BTW, if you prefer - we can move the discussion to systemtap.

Done.

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Better pagecache statistics ?
  2005-12-28 17:08   ` Marcelo Tosatti
  2005-12-28 19:21     ` Tom Zanussi
@ 2005-12-29  6:53     ` Frank Ch. Eigler
  1 sibling, 0 replies; 4+ messages in thread
From: Frank Ch. Eigler @ 2005-12-29  6:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: systemtap

Hi -

> [...]
> a) nanosecond timekeeping 
> Since the systemtap language does not support "struct" abstraction, but
> simply "long/string/array" types, there is no way to easily return more
> than one value from a function. Is it possible to pass references down
> to functions so as to return more than one value? 

We don't support references (this enables stricter and simpler
checking), nor multiple return values (though we could - suggest a
good syntax for extracting tuple parts!).

> [...]
> For nanosecond timekeeping one needs second/nanosecond tuple (struct
> timespec).

Are you sure you need something beyond the 64-bit signed integers that
systemtap uses for all its numerics?

> b) ERROR: MAXACTION exceeded near identifier 'log' at ttfp_delay.stp:49:3
> The array size is capped to a maximum.

The MAXACTION limit is associated with code execution, not reserved
sizes for arrays (MAXMAPENTRIES ?).  These values can be overridden
from the stap command line using -D.

> Is there any way to configure SystemTap to periodically
> dump-and-zero the arrays? This makes lots of sense to any
> statistical gathering code.

As mentioned in the other message, a script programmer can do this
with several kinds of explicit code.

> c) Hash tables It would be better to store the log entries in a hash
> table, the present script uses the "current" pointer as a key into a
> pair of arrays, [...]

Systemtap arrays *are* associative, and use hash tables.  Consider
using the pid() or tgid() value to index into them.

> And finally, there seems to be a bug which results in _very_ large
> (several seconds) delays - that seems unlikely to really happening.

That is strange, though I recall hearing of vm problems that might
manifest themselves like that.  Or maybe your "current"-pointer based
indexing is unintentionally colliding.  Consider setting extra probe
points deeper down.

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Better pagecache statistics ?
  2005-12-28 17:08   ` Marcelo Tosatti
@ 2005-12-28 19:21     ` Tom Zanussi
  2005-12-29  6:53     ` Frank Ch. Eigler
  1 sibling, 0 replies; 4+ messages in thread
From: Tom Zanussi @ 2005-12-28 19:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Badari Pulavarty, fche, linux-mm, lkml, systemtap

Marcelo Tosatti writes:

[...]

 > 
 > b) ERROR: MAXACTION exceeded near identifier 'log' at ttfp_delay.stp:49:3
 > 
 > The array size is capped to a maximum. Is there any way to configure
 > SystemTap to periodically dump-and-zero the arrays? This makes lots of
 > sense to any statistical gathering code.
 > 
 > c) Hash tables
 > 
 > It would be better to store the log entries in a hash table, the present
 > script uses the "current" pointer as a key into a pair of arrays,
 > incrementing the key until a free one is found (which can be very
 > inefficient).
 > 
 > A hash table would be much more efficient, but allocating memory inside
 > the scripts is tricky. A pre-allocated, pre-sized pool of memory could 
 > work well for this purpose. The "dump-array-entries-to-userspace" action
 > could be used to free them.
 > 
 > So both b) and c) could be fixed with the same logic:
 > 
 > - dump entries to userspace if memory pool is getting short 
 > on free entries.
 > - periodically dump entries to userspace (akin to "bdflush").

Hi,

There's a sytemtap example that does something similar to what you're
describing - see the kmalloc-stacks/kmalloc-top examples in the
testsuite:

systemtap/tests/systemtap.samples/kmalloc-stacks.stp
systemtap/tests/systemtap.samples/kmalloc-top

Basically, the kmalloc-stacks.stp script hashes data in a systemtap
hash and periodically formats the current contents of the hash table
into a convenient form and writes it to userspace, then clears the
hash for the next go-round.  kmalloc-top is a companion Perl script
'daemon' that sits around in userspace waiting for new batches of hash
data, which it then adds to a continuously accumulating Perl hash in
the user-side script.  There's a bit more detail about the script(s)
here:

http://sourceware.org/ml/systemtap/2005-q3/msg00550.html

HTH,

Tom


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Better pagecache statistics ?
       [not found] ` <20051201152029.GA14499@dmt.cnet>
@ 2005-12-28 17:08   ` Marcelo Tosatti
  2005-12-28 19:21     ` Tom Zanussi
  2005-12-29  6:53     ` Frank Ch. Eigler
  0 siblings, 2 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2005-12-28 17:08 UTC (permalink / raw)
  To: Badari Pulavarty, fche; +Cc: linux-mm, lkml, systemtap

Badari, any improvements on the {add_to,remove_from}_page_cache hooks?

> I just started playing with SystemTap yesterday. First
> thing I want to record is "what is the latency of 
> direct reclaim".

I've come up with something which works, though pretty dumb and
inefficient.

I'm facing three problems, maybe someone has a clue on how to improve 
the situation.

a) nanosecond timekeeping 

Since the systemtap language does not support "struct" abstraction, but
simply "long/string/array" types, there is no way to easily return more
than one value from a function. Is it possible to pass references down
to functions so as to return more than one value? 

I failed to find any way to do that.

For nanosecond timekeeping one needs second/nanosecond tuple (struct
timespec).

b) ERROR: MAXACTION exceeded near identifier 'log' at ttfp_delay.stp:49:3

The array size is capped to a maximum. Is there any way to configure
SystemTap to periodically dump-and-zero the arrays? This makes lots of
sense to any statistical gathering code.

c) Hash tables

It would be better to store the log entries in a hash table, the present
script uses the "current" pointer as a key into a pair of arrays,
incrementing the key until a free one is found (which can be very
inefficient).

A hash table would be much more efficient, but allocating memory inside
the scripts is tricky. A pre-allocated, pre-sized pool of memory could 
work well for this purpose. The "dump-array-entries-to-userspace" action
could be used to free them.

So both b) and c) could be fixed with the same logic:

- dump entries to userspace if memory pool is getting short 
on free entries.
- periodically dump entries to userspace (akin to "bdflush").

And finally, there seems to be a bug which results in _very_ large
(several seconds) delays - that seems unlikely to really happening.

Thoughts?

/* 
 * ttfp_delay - measure direct reclaim latency 
 */

global count_try_to_free_pages
global count_exit_try_to_free_pages

global entry_array_us
global exit_array_us

global entry_array_ms
global exit_array_ms

function get_currentpointer:long () %{
	THIS->__retvalue = (int) current;
%}

probe kernel.function("try_to_free_pages")
{
	current_p = get_currentpointer();
	++count_try_to_free_pages;
	while (entry_array_us[current_p])
		++current_p;

	entry_array_us[current_p] = gettimeofday_us();
	entry_array_ms[current_p] = gettimeofday_ms();
}

probe kernel.function("try_to_free_pages").return
{
	current_p = get_currentpointer();
	++count_exit_try_to_free_pages;
	while (exit_array_us[current_p])
		++current_p;

	exit_array_us[current_p] = gettimeofday_us();
	exit_array_ms[current_p] = gettimeofday_ms();
}

probe begin { log("starting probe") }

probe end
{
	log("ending probe")
	log ("calls to try_to_free_pages: " . string(count_try_to_free_pages));
	log ("returns from try_to_free_pages: " . string(count_exit_try_to_free_pages));
	foreach(var in entry_array_us) {
		pos++;
		log ("try_to_free_pages (" . string(pos) .  ") delta: " . string(exit_array_us[var] - entry_array_us[var]) . "us " .string (exit_array_ms[var] - entry_array_ms[var]) . "ms ");
	}

}

example output, running a 800MB "dd" copy on the background.

[root@dmt examples]# stap -g ttfp_delay.stp 
starting probe
ending probe
calls to try_to_free_pages: 387
returns from try_to_free_pages: 373
try_to_free_pages (1) delta: 15028us 15ms 
try_to_free_pages (2) delta: 47677211us 47677ms 
try_to_free_pages (3) delta: 39us 0ms 
try_to_free_pages (4) delta: 35us 0ms 
try_to_free_pages (5) delta: 152us 0ms 
try_to_free_pages (6) delta: 104us 0ms 
try_to_free_pages (7) delta: 353us 0ms 
try_to_free_pages (8) delta: 61us 0ms 
try_to_free_pages (9) delta: 187us 0ms 
try_to_free_pages (10) delta: 55us 0ms 
try_to_free_pages (11) delta: 50us 0ms 
try_to_free_pages (12) delta: 30us 0ms 
try_to_free_pages (13) delta: 31us 0ms 
try_to_free_pages (14) delta: 42us 0ms 
try_to_free_pages (15) delta: 37us 0ms 
try_to_free_pages (16) delta: 178us 0ms 
try_to_free_pages (17) delta: 34us 0ms 
try_to_free_pages (18) delta: 37us 0ms 
try_to_free_pages (19) delta: 35us 0ms 
try_to_free_pages (20) delta: 34us 0ms 
try_to_free_pages (21) delta: 65us 0ms 
...

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-12-28 21:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1133453411.2853.67.camel@laptopd505.fenrus.org>
     [not found] ` <20051201170850.GA16235@dmt.cnet>
     [not found]   ` <1133457315.21429.29.camel@localhost.localdomain>
     [not found]     ` <1133457700.2853.78.camel@laptopd505.fenrus.org>
     [not found]       ` <20051201175711.GA17169@dmt.cnet>
     [not found]         ` <1133461212.21429.49.camel@localhost.localdomain>
     [not found]           ` <y0md5kfxi15.fsf@tooth.toronto.redhat.com>
     [not found]             ` <1133562716.21429.103.camel@localhost.localdomain>
     [not found]               ` <20051202224645.GB6576@redhat.com>
     [not found]                 ` <1133567206.21429.117.camel@localhost.localdomain>
2005-12-03  1:59                   ` Better pagecache statistics ? Frank Ch. Eigler
     [not found] <1133377029.27824.90.camel@localhost.localdomain>
     [not found] ` <20051201152029.GA14499@dmt.cnet>
2005-12-28 17:08   ` Marcelo Tosatti
2005-12-28 19:21     ` Tom Zanussi
2005-12-29  6:53     ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).