public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* systemtap/pcp integration
@ 2014-07-17 21:14 David Smith
  2014-07-18 15:49 ` Frank Ch. Eigler
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: David Smith @ 2014-07-17 21:14 UTC (permalink / raw)
  To: Systemtap List, pcp

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

Here's a small update on the prototype systemtap/pcp integration work
I'm doing. I've create a systemtap branch, called 'dsmith/mmv' that
contains all my work. Basically this work allows systemtap to create
'mmv' memory mapped files.

If you checkout and build that, then you should be able to run the
attached systemtap script. This script is a translation of pcp's example
python mmv script found in src/python/mmv.py.

Note that systemtap will create a file called 'mmv' in
/proc/systemtap/{MODULE_NAME}. I've just been using pcp's 'mmvdump'
utility to dump the contents of the /proc/systemtap/{MODULE_NAME}/mmv
file. Currently the pcp mmv pmda only looks in one place for mmv files,
but it might be possible to create a symbolic link to systemtap's mmv
file to make it happy.

The code works for the attached script, but I'm sure it is quite
fragile. Things like locking, error checking, documentation, etc. need
to be done.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

[-- Attachment #2: mmv.stp --]
[-- Type: text/plain, Size: 1286 bytes --]

global inst0, inst1, counter_metric, instant_metric, indom_metric
global instant_value

probe begin
{
	inst0 = mmv_add_instance(0, "zero")
	inst1 = mmv_add_instance(1, "hero")
	printf("instances: %d, %d\n", inst0, inst1)

	indom0 = mmv_add_indom(1, "We can be heroes",
			       "Set of instances from zero to hero")
 	mmv_add_indom_instance(indom0, inst0)
 	mmv_add_indom_instance(indom0, inst1)

	printf("indom: %d\n", indom0)

	counter_metric = mmv_add_metric("counter", 1, MMV_TYPE_NUMBER,
				 	MMV_SEM_COUNTER,
					mmv_units(0, 0, 1, 0, 0, 0),
					0, "Example counter metric",
					"Yep, a test counter metric")
	instant_metric = mmv_add_metric("instant", 2, MMV_TYPE_NUMBER,
					MMV_SEM_INSTANT,
					mmv_units(0, 0, 0, 0, 0, 0), 0,
					"Example instant metric",
					"Yep, a test instantaneous metric")
	indom_metric = mmv_add_metric("indom", 3, MMV_TYPE_NUMBER,
				      MMV_SEM_DISCRETE,
				      mmv_units(0, 0, 0, 0, 0, 0),
				      1, "", "")
	printf("metrics: %d, %d, %d\n", counter_metric, instant_metric,
	       indom_metric)

	mmv_stats_start(42, 0)

	instant_value = mmv_lookup_value(instant_metric, 0)
	mmv_set_value(instant_value, 41)
	mmv_inc_value(instant_value, 2)
}

probe timer.s(1)
{
	mmv_inc_value(instant_value, 3)
}

probe end
{
	mmv_stats_stop()
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-17 21:14 systemtap/pcp integration David Smith
@ 2014-07-18 15:49 ` Frank Ch. Eigler
  2014-07-18 17:02   ` David Smith
  2014-07-18 21:10 ` [pcp] " William Cohen
  2014-07-22  1:32 ` Nathan Scott
  2 siblings, 1 reply; 16+ messages in thread
From: Frank Ch. Eigler @ 2014-07-18 15:49 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp


Hi, David -

> Here's a small update on the prototype systemtap/pcp integration work
> I'm doing. 

Thanks, what a great start.

> I've create a systemtap branch, called 'dsmith/mmv' that contains
> all my work. Basically this work allows systemtap to create 'mmv'
> memory mapped files.

(That's git://sourceware.org/git/systemtap.git branch dsmith/mmv.)

> [...] The code works for the attached script, but I'm sure it is
> quite fragile. Things like locking, error checking, documentation,
> etc. need to be done. [...]

Overall, are you happy with the general approach of reusing the exact
MMV format (and thus the PMDA)?

At one point I suggested reworking the earlier prototype so that the
bulk of the MMV format's emulation would be based on tapset script
code (and possibly more declarative / dynamic / safe) rather than C.
Have you come to any conclusions about the propriety of that?

How much post-initialization change can the MMV format tolerate, as
regarding indom contents or metric availability?  I assume such
metadata changes are synchronized with the PMDA via the generation
numbers.  Moving around contents of the mmap region as in
__stp_mmv_alloc_data_item sounds like it leaves the data inconsistent
during the process; does it need similar protection?

- FChE

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-18 15:49 ` Frank Ch. Eigler
@ 2014-07-18 17:02   ` David Smith
  2014-07-18 18:27     ` Frank Ch. Eigler
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2014-07-18 17:02 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Systemtap List, pcp

On 07/18/2014 10:49 AM, Frank Ch. Eigler wrote:
> 
> Hi, David -
> 
>> Here's a small update on the prototype systemtap/pcp integration work
>> I'm doing. 
> 
> Thanks, what a great start.
> 
>> I've create a systemtap branch, called 'dsmith/mmv' that contains
>> all my work. Basically this work allows systemtap to create 'mmv'
>> memory mapped files.
> 
> (That's git://sourceware.org/git/systemtap.git branch dsmith/mmv.)
>
>> [...] The code works for the attached script, but I'm sure it is
>> quite fragile. Things like locking, error checking, documentation,
>> etc. need to be done. [...]
> 
> Overall, are you happy with the general approach of reusing the exact
> MMV format (and thus the PMDA)?

That's a good question. I certainly started with reusing the exact MMV
format because that's a working format with an existing data consumer on
the pcp side of things.

Note that the current systemtap implementation is actually a subset of
the full pcp MMV format - currently it only supports 64-bit integers
(with string values to come).

However, as I've worked with the MMV format I've come to realize its
limitations. As Nathan has pointed out in another email, the MMV format
is designed to only support exporting values, and isn't suited for more
event-like tracing. As far as the more technical side of things goes,
some of the internal offset logic might be done better/differently.

> At one point I suggested reworking the earlier prototype so that the
> bulk of the MMV format's emulation would be based on tapset script
> code (and possibly more declarative / dynamic / safe) rather than C.
> Have you come to any conclusions about the propriety of that?

I've been focused on other things, like reworking the allocation logic.
As you describe it above, I'm not sure I see where you are headed.
Certainly when I add systemtap's dyninst support into the picture, I was
planning on sharing that code between systemtap's linux and dyninst
runtimes.

> How much post-initialization change can the MMV format tolerate, as
> regarding indom contents or metric availability?  I assume such
> metadata changes are synchronized with the PMDA via the generation
> numbers.  Moving around contents of the mmap region as in
> __stp_mmv_alloc_data_item sounds like it leaves the data inconsistent
> during the process; does it need similar protection?

The MMV format doesn't support any post-initialization changes - once
you call "start" the file format can't change without removing and
recreating the file. (The reader knows "start" has been called based on
when the generation numbers match.)

As far as systemtap is concerned, all the memory moving that
__stp_mmv_alloc_data_item() does is done before calling "start", so the
data being inconsistent as far as the reader is concerned doesn't
matter. For the linux runtime side of things, we could not allow reads
of the memory-mapped data until "start" has been called (and the data is
consistent).

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-18 17:02   ` David Smith
@ 2014-07-18 18:27     ` Frank Ch. Eigler
  2014-07-22 14:25       ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Ch. Eigler @ 2014-07-18 18:27 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp

Hi -

dsmith wrote:

> [...]
> > Overall, are you happy with the general approach of reusing the exact
> > MMV format (and thus the PMDA)?
> 
> [...]
> However, as I've worked with the MMV format I've come to realize its
> limitations. As Nathan has pointed out in another email, the MMV format
> is designed to only support exporting values, and isn't suited for more
> event-like tracing. As far as the more technical side of things goes,
> some of the internal offset logic might be done better/differently.

An application of pmda/mmv & pmda/logger to the same stap module could
perhaps accomplish both goals (assuming we consider the pcp events
overengineered to the extent that supplying timestamped strings is
sufficient).  Have you considered an alternative unified design?

This reminds me of another PCP PMDA we've mentioned in the past: a
JSON fetcher/parser.  We'll need something like this for a variety of
non-systemtap purposes too (interop with JSON-producing tools).  What
if stap were to produce pcp metrics in the form of /proc/systemtap/*
JSON files that the PMDA would read on demand?  (The cost of the
parsing overhead may be low enough not to worry about it.)  A separate
generated JSON file could provide metadata.  That format could be rich
enough to contain events too (mapped from arrays of string).


> > At one point I suggested reworking the earlier prototype so that the
> > bulk of the MMV format's emulation would be based on tapset script
> > code (and possibly more declarative / dynamic / safe) rather than C.
> > Have you come to any conclusions about the propriety of that?
> 
> I've been focused on other things, like reworking the allocation logic.
> As you describe it above, I'm not sure I see where you are headed.

To spell it out, the idea was to encode the mmv format logic
(including metadata management) within a stap tapset script instead of
as C in the runtime.  Then the C runtime would need to do nothing but
provide a memory-mapped-byte-array kind of abstraction, and a way for
the script code to read/write it (maybe a variant of
sprintf("%b...")?).


> [...]
> > How much post-initialization change can the MMV format tolerate, as
> > regarding indom contents or metric availability?  I assume such
> > metadata changes are synchronized with the PMDA via the generation
> > numbers.  Moving around contents of the mmap region as in
> > __stp_mmv_alloc_data_item sounds like it leaves the data inconsistent
> > during the process; does it need similar protection?
> 
> The MMV format doesn't support any post-initialization changes - once
> you call "start" the file format can't change without removing and
> recreating the file. (The reader knows "start" has been called based on
> when the generation numbers match.) [...]

(I think generation numbers can be changed during the run, to trigger
a pmda/mmv reload, but don't know how thoroughly that works.)


- FChE

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pcp] systemtap/pcp integration
  2014-07-17 21:14 systemtap/pcp integration David Smith
  2014-07-18 15:49 ` Frank Ch. Eigler
@ 2014-07-18 21:10 ` William Cohen
  2014-07-21 15:43   ` David Smith
  2014-07-22  1:32 ` Nathan Scott
  2 siblings, 1 reply; 16+ messages in thread
From: William Cohen @ 2014-07-18 21:10 UTC (permalink / raw)
  To: David Smith, Systemtap List, pcp

On 07/17/2014 05:14 PM, David Smith wrote:
> Here's a small update on the prototype systemtap/pcp integration work
> I'm doing. I've create a systemtap branch, called 'dsmith/mmv' that
> contains all my work. Basically this work allows systemtap to create
> 'mmv' memory mapped files.
> 
> If you checkout and build that, then you should be able to run the
> attached systemtap script. This script is a translation of pcp's example
> python mmv script found in src/python/mmv.py.
> 
> Note that systemtap will create a file called 'mmv' in
> /proc/systemtap/{MODULE_NAME}. I've just been using pcp's 'mmvdump'
> utility to dump the contents of the /proc/systemtap/{MODULE_NAME}/mmv
> file. Currently the pcp mmv pmda only looks in one place for mmv files,
> but it might be possible to create a symbolic link to systemtap's mmv
> file to make it happy.
> 
> The code works for the attached script, but I'm sure it is quite
> fragile. Things like locking, error checking, documentation, etc. need
> to be done.
> 
> 
> 
> _______________________________________________
> pcp mailing list
> pcp@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/pcp
> 

Hi David,

I was able to get this to work on RHEL 7.  However, on fedora20 the mmvdump gets a sigsegv in the dump() function.

-Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pcp] systemtap/pcp integration
  2014-07-18 21:10 ` [pcp] " William Cohen
@ 2014-07-21 15:43   ` David Smith
  2014-07-21 15:54     ` William Cohen
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2014-07-21 15:43 UTC (permalink / raw)
  To: William Cohen, Systemtap List, pcp

On 07/18/2014 04:10 PM, William Cohen wrote:
> Hi David,
> 
> I was able to get this to work on RHEL 7.  However, on fedora20 the mmvdump
> gets a sigsegv in the dump() function.

Thanks for the bug report. I believe I just fixed this one, so after
updating let me know if it doesn't work for you.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pcp] systemtap/pcp integration
  2014-07-21 15:43   ` David Smith
@ 2014-07-21 15:54     ` William Cohen
  0 siblings, 0 replies; 16+ messages in thread
From: William Cohen @ 2014-07-21 15:54 UTC (permalink / raw)
  To: David Smith, Systemtap List, pcp

On 07/21/2014 11:43 AM, David Smith wrote:
> On 07/18/2014 04:10 PM, William Cohen wrote:
>> Hi David,
>>
>> I was able to get this to work on RHEL 7.  However, on fedora20 the mmvdump
>> gets a sigsegv in the dump() function.
> 
> Thanks for the bug report. I believe I just fixed this one, so after
> updating let me know if it doesn't work for you.
> 

Hi David,

That fix made fedora 20 happy.  In one window did:

$ ./systemtap_write/install/bin/stap mmv.stp 
instances: 0, 1
indom: 0
metrics: 0, 1, 2

In another ran the mmvdump from pcp:

$ ./mmvdump /proc/systemtap/stap_cb2f18a783870e1474713324d01a25f_25939/mmv 
MMV file   = /proc/systemtap/stap_cb2f18a783870e1474713324d01a25f_25939/mmv
Version    = 1
Generated  = 769479
TOC count  = 5
Cluster    = 42
Process    = 0
Flags      = 0x0

TOC[0]: offset 40, indoms offset 1848 (1 entries)
  [1/1848] 2 instances, starting at offset 152
       shorttext=We can be heroes
       helptext=Set of instances from zero to hero

TOC[1]: offset 56, instances offset 152 (2 entries)
  [1/152] instance = [0 or "zero"]
  [1/232] instance = [1 or "hero"]

TOC[2]: toc offset 72, metrics offset 1880 (3 entries)
  [1/1880] counter
       type=64-bit int (0x2), sem=counter (0x1), pad=0x0
       units=count
       (no indom)
       shorttext=Example counter metric
       helptext=Yep, a test counter metric
  [2/1984] instant
       type=64-bit int (0x2), sem=instant (0x3), pad=0x0
       units=
       (no indom)
       shorttext=Example instant metric
       helptext=Yep, a test instantaneous metric
  [3/2088] indom
       type=64-bit int (0x2), sem=discrete (0x4), pad=0x0
       units=
       indom=1
       (no shorttext)
       (no helptext)

TOC[3]: offset 88, values offset 2192 (4 entries)
  [1/2192] counter = 0
  [2/2224] instant = 325
  [3/2256] indom[0 or "zero"] = 0
  [3/2288] indom[0 or "zero"] = 0

TOC[4]: offset 104, string offset 312 (6 entries)
  [1/312] We can be heroes
  [2/568] Set of instances from zero to hero
  [3/824] Example counter metric
  [4/1080] Yep, a test counter metric
  [5/1336] Example instant metric
  [6/1592] Yep, a test instantaneous metric

-Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pcp] systemtap/pcp integration
  2014-07-17 21:14 systemtap/pcp integration David Smith
  2014-07-18 15:49 ` Frank Ch. Eigler
  2014-07-18 21:10 ` [pcp] " William Cohen
@ 2014-07-22  1:32 ` Nathan Scott
  2014-07-22  2:57   ` Frank Ch. Eigler
  2014-07-22 14:50   ` [pcp] " David Smith
  2 siblings, 2 replies; 16+ messages in thread
From: Nathan Scott @ 2014-07-22  1:32 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

Hi David,

----- Original Message -----
> [...]
> Note that systemtap will create a file called 'mmv' in
> /proc/systemtap/{MODULE_NAME}. I've just been using pcp's 'mmvdump'
> utility to dump the contents of the /proc/systemtap/{MODULE_NAME}/mmv
> file. Currently the pcp mmv pmda only looks in one place for mmv files,
> but it might be possible to create a symbolic link to systemtap's mmv
> file to make it happy.

(OOC, what's {MODULE_NAME} in this context?)

A symlink would sorta work but it feels like a bit of a workaround - the
PMDA is written to be able to detect arrival/departure of new MMV files
based on changes in a directory (and the location of that directory is
parameterised via /etc/pcp.conf variables).  I'd not recommend trying to
find it within a stap script ... I imagine it will be cleaner if we go
for separate user/kernel locations for MMV files.

Attached patch (lightly tested) implements the PCP side of things with
that in mind - with this patch (and making the kernel code manage the
lifecycle of separate /proc/mmv/* entries), things should begin to work
out-of-the-box (the MMV PMDA is already default-enabled in the default
pmcd config file, so everything else is in place for you).

It also occurs to me that there's not really anything systemtap-specific
about the kernel MMV instrumentation you've done, or is there?  A device
driver author might want to instrument her code, for example, without 
needing a stap install/script...?  It would be worth thinking about if
the MMV bits in your script could become a more general kernel module
for others to use too (in addition to stap, of course!).  The extremely
lightweight nature of the interface is such that it can be permanently
enabled (that's certainly how the userspace MMV is used).

Nice work anyway ... I'm looking forward to seeing the code in wider use.


Oh, to use the attached patch with PCP tools - apply to a "dev" PCP tree
(git.performancecopilot.org/pcp), build, cd src/pmdas/mmv/src, and ...

$ pminfo -L -Kclear -Kadd,70,./pmda_mmv.so,mmv_init mmv

... will serve as a quick sanity test that the metrics are functioning.
You can also use gdb on /usr/bin/pminfo to inspect behaviour of the PMDA.

Once thats going, installing it properly ('make install') and restarting
the pmcd service should get you a fully-functional PCP setup with your
kernel code firing on all cylinders (pmchart, pmie, everything will work
at that point).

cheers.

--
Nathan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: proc-mmv.patch --]
[-- Type: text/x-patch; name=proc-mmv.patch, Size: 5538 bytes --]

diff --git a/src/pmdas/mmv/src/mmv.c b/src/pmdas/mmv/src/mmv.c
index f3ef26f..92cd4e1 100644
--- a/src/pmdas/mmv/src/mmv.c
+++ b/src/pmdas/mmv/src/mmv.c
@@ -42,15 +42,21 @@ static int incnt;
 
 static int reload;
 static __pmnsTree * pmns;
-static int statsdir_code;		/* last statsdir stat code */
-static time_t statsdir_ts;		/* last statsdir timestamp */
 static char * prefix = "mmv";
 
 static char * pcptmpdir;		/* probably /var/tmp */
 static char * pcpvardir;		/* probably /var/pcp */
 static char * pcppmdasdir;		/* probably /var/pcp/pmdas */
 static char pmnsdir[MAXPATHLEN];	/* pcpvardir/pmns */
-static char statsdir[MAXPATHLEN];	/* pcptmpdir/<prefix> */
+
+typedef struct {
+    char	path[MAXPATHLEN];
+    int		status;			/* last statsdir stat return code */
+    time_t	timestamp;		/* last statsdir modify timestamp */
+} statsdir_t;
+
+static statsdir_t sdirlist[2];		/* userspace and kernel MMV files */
+#define SDIRCOUNT (sizeof(sdirlist)/sizeof(sdirlist[0]))
 
 typedef struct {
     char *	name;			/* strdup client name */
@@ -364,13 +370,44 @@ create_indom(pmdaExt *pmda, stats_t *s, mmv_disk_indom_t *id, pmInDom indom)
     return 0;
 }
 
+static int
+create_client_statdir(statsdir_t *statsdir)
+{
+    struct dirent **files;
+    int need_reload = 0;
+    int i, num;
+
+    num = scandir(statsdir->path, &files, NULL, NULL);
+    for (i = 0; i < num; i++) {
+	struct stat statbuf;
+	char path[MAXPATHLEN];
+	char *client;
+
+	if (files[i]->d_name[0] == '.')
+	    continue;
+
+	client = files[i]->d_name;
+	sprintf(path, "%s%c%s", statsdir->path, __pmPathSeparator(), client);
+
+	if (stat(path, &statbuf) >= 0 && S_ISREG(statbuf.st_mode))
+	    if (create_client_stat(client, path, statbuf.st_size) == -EAGAIN)
+		need_reload = 1;
+    }
+
+    for (i = 0; i < num; i++)
+	free(files[i]);
+    if (num > 0)
+	free(files);
+
+    return need_reload;
+}
+
 static void
 map_stats(pmdaExt *pmda)
 {
-    struct dirent **files;
     char name[64];
     int need_reload = 0;
-    int i, j, k, sts, num;
+    int i, j, k, sts;
 
     if (pmns)
 	__pmFreePMNS(pmns);
@@ -407,27 +444,8 @@ map_stats(pmdaExt *pmda)
 	scnt = 0;
     }
 
-    num = scandir(statsdir, &files, NULL, NULL);
-    for (i = 0; i < num; i++) {
-	struct stat statbuf;
-	char path[MAXPATHLEN];
-	char *client;
-
-	if (files[i]->d_name[0] == '.')
-	    continue;
-
-	client = files[i]->d_name;
-	sprintf(path, "%s%c%s", statsdir, __pmPathSeparator(), client);
-
-	if (stat(path, &statbuf) >= 0 && S_ISREG(statbuf.st_mode))
-	    if (create_client_stat(client, path, statbuf.st_size) == -EAGAIN)
-		need_reload = 1;
-    }
-
-    for (i = 0; i < num; i++)
-	free(files[i]);
-    if (num > 0)
-	free(files);
+    for (i = 0; i < SDIRCOUNT; i++)
+	need_reload += create_client_statdir(&sdirlist[i]);
 
     for (i = 0; slist && i < scnt; i++) {
 	stats_t * s = slist + i;
@@ -609,11 +627,38 @@ mmv_fetchCallBack(pmdaMetric *mdesc, unsigned int inst, pmAtomValue *atom)
     return 0;
 }
 
+static int
+check_time_statsdir(statsdir_t *statsdir)
+{
+    struct stat s;
+    int need_reload = 0;
+
+    /*
+     * check if the directory has been modified, reload if so;
+     * note modification may involve removal or newly appeared,
+     * a change in permissions from accessible to not (or vice-
+     * versa), and so on.
+     */
+    if (stat(statsdir->path, &s) >= 0) {
+	if (s.st_mtime != statsdir->timestamp) {
+	    need_reload++;
+	    statsdir->status = 0;
+	    statsdir->timestamp = s.st_mtime;
+	}
+    } else {
+	if (statsdir->status != oserror()) {
+	    statsdir->status = oserror();
+	    statsdir->timestamp = 0;
+	    need_reload++;
+	}
+    }
+    return need_reload;
+}
+
 static void
 mmv_reload_maybe(pmdaExt *pmda)
 {
     int i;
-    struct stat s;
     int need_reload = reload;
 
     /* check if generation numbers changed or monitored process exited */
@@ -629,26 +674,8 @@ mmv_reload_maybe(pmdaExt *pmda)
 	}
     }
 
-    /*
-     * check if the directory has been modified, reload if so;
-     * note modification may involve removal or newly appeared,
-     * a change in permissions from accessible to not (or vice-
-     * versa), and so on.
-     */
-    if (stat(statsdir, &s) >= 0) {
-	if (s.st_mtime != statsdir_ts) {
-	    need_reload++;
-	    statsdir_code = 0;
-	    statsdir_ts = s.st_mtime;
-	}
-    } else {
-	i = oserror();
-	if (statsdir_code != i) {
-	    statsdir_code = i;
-	    statsdir_ts = 0;
-	    need_reload++;
-	}
-    }
+    for (i = 0; i < SDIRCOUNT; i++)
+	need_reload += check_time_statsdir(&sdirlist[i]);
 
     if (need_reload) {
 	if (pmDebug & DBG_TRACE_APPL0)
@@ -808,6 +835,7 @@ mmv_init(pmdaInterface *dp)
 {
     int	m;
     int sep = __pmPathSeparator();
+    char *procdir;
 
     if (isDSO) {
 	pmdaDSO(dp, PMDA_INTERFACE_4, "mmv", NULL);
@@ -818,12 +846,15 @@ mmv_init(pmdaInterface *dp)
     pcptmpdir = pmGetConfig("PCP_TMP_DIR");
     pcpvardir = pmGetConfig("PCP_VAR_DIR");
     pcppmdasdir = pmGetConfig("PCP_PMDAS_DIR");
+    if ((procdir = getenv("PROC_STATSPATH")) == NULL)
+	procdir = "/proc";
 
-    snprintf(statsdir, sizeof(statsdir), "%s%c%s", pcptmpdir, sep, prefix);
     snprintf(pmnsdir, sizeof(pmnsdir), "%s%c" "pmns", pcpvardir, sep);
-    statsdir[sizeof(statsdir)-1] = '\0';
     pmnsdir[sizeof(pmnsdir)-1] = '\0';
 
+    snprintf(sdirlist[0].path, MAXPATHLEN-1, "%s%c%s", pcptmpdir, sep, prefix);
+    snprintf(sdirlist[1].path, MAXPATHLEN-1, "%s%c%s", procdir, sep, prefix);
+
     /* Initialize internal dispatch table */
     if (dp->status == 0) {
 	/*

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-22  1:32 ` Nathan Scott
@ 2014-07-22  2:57   ` Frank Ch. Eigler
  2014-07-22 14:50   ` [pcp] " David Smith
  1 sibling, 0 replies; 16+ messages in thread
From: Frank Ch. Eigler @ 2014-07-22  2:57 UTC (permalink / raw)
  To: Nathan Scott; +Cc: David Smith, pcp, Systemtap List


Hi, Nathan -

nathans wrote:

> [...]
> (OOC, what's {MODULE_NAME} in this context?)

It is usually a machine-generated unique identifier for the systemtap
script in question, such as "stap_deadbeefdeadbeefdeadbeef_2222".  It
is not generally predictable in advance.  (A user can override the
name, but that costs possible collisions.)


> [...] PMDA is written to be able to detect arrival/departure of new
> MMV files based on changes in a directory (and the location of that
> directory is parameterised via /etc/pcp.conf variables).  [...]

If we do end up sticking with this MMV approach, the PMDA would
probably need to use glob(3) or similar to search for
"/proc/systemtap/*/mmv" instead of "/proc/mmv/*".


> It also occurs to me that there's not really anything
> systemtap-specific about the kernel MMV instrumentation you've done,
> or is there?  [...]  It would be worth thinking about if the MMV
> bits in your script could become a more general kernel module for
> others to use too [...]

Yes, perhaps, though the path of code implementing a seemingly good
idea into the kernel is rarely smooth.  They generally dislike
infrastructure unless it's well-yearned-for.  So, if you wish to
pursue this goal, some serious selling on LKML will be needed.


- FChE

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-18 18:27     ` Frank Ch. Eigler
@ 2014-07-22 14:25       ` David Smith
  2014-07-22 15:21         ` Frank Ch. Eigler
  0 siblings, 1 reply; 16+ messages in thread
From: David Smith @ 2014-07-22 14:25 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Systemtap List, pcp

On 07/18/2014 01:27 PM, Frank Ch. Eigler wrote:
> Hi -
> 
> dsmith wrote:
> 
>> [...]
>>> Overall, are you happy with the general approach of reusing the exact
>>> MMV format (and thus the PMDA)?
>>
>> [...]
>> However, as I've worked with the MMV format I've come to realize its
>> limitations. As Nathan has pointed out in another email, the MMV format
>> is designed to only support exporting values, and isn't suited for more
>> event-like tracing. As far as the more technical side of things goes,
>> some of the internal offset logic might be done better/differently.
> 
> An application of pmda/mmv & pmda/logger to the same stap module could
> perhaps accomplish both goals (assuming we consider the pcp events
> overengineered to the extent that supplying timestamped strings is
> sufficient).  Have you considered an alternative unified design?
> 
> This reminds me of another PCP PMDA we've mentioned in the past: a
> JSON fetcher/parser.  We'll need something like this for a variety of
> non-systemtap purposes too (interop with JSON-producing tools).  What
> if stap were to produce pcp metrics in the form of /proc/systemtap/*
> JSON files that the PMDA would read on demand?  (The cost of the
> parsing overhead may be low enough not to worry about it.)  A separate
> generated JSON file could provide metadata.  That format could be rich
> enough to contain events too (mapped from arrays of string).

I think a JSON fetcher/parser is a good idea.

>>> At one point I suggested reworking the earlier prototype so that the
>>> bulk of the MMV format's emulation would be based on tapset script
>>> code (and possibly more declarative / dynamic / safe) rather than C.
>>> Have you come to any conclusions about the propriety of that?
>>
>> I've been focused on other things, like reworking the allocation logic.
>> As you describe it above, I'm not sure I see where you are headed.
> 
> To spell it out, the idea was to encode the mmv format logic
> (including metadata management) within a stap tapset script instead of
> as C in the runtime.  Then the C runtime would need to do nothing but
> provide a memory-mapped-byte-array kind of abstraction, and a way for
> the script code to read/write it (maybe a variant of
> sprintf("%b...")?).

Hmm. I think I see where you are going with this, but I don't know if it
will work well for a couple of reasons. One is that you don't just write
to this array, you have to be able to read it back (in order to shuffle
things around, hook up instances to indoms, etc.). In addition, the best
way to ensure that a client can read the data produced by the mmv stuff
is to use the same header file so we know the data is laid out the same way.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pcp] systemtap/pcp integration
  2014-07-22  1:32 ` Nathan Scott
  2014-07-22  2:57   ` Frank Ch. Eigler
@ 2014-07-22 14:50   ` David Smith
  2014-07-23 10:29     ` Nathan Scott
  1 sibling, 1 reply; 16+ messages in thread
From: David Smith @ 2014-07-22 14:50 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Systemtap List, pcp

On 07/21/2014 08:32 PM, Nathan Scott wrote:
> Hi David,
> 
> ----- Original Message -----
>> [...]
>> Note that systemtap will create a file called 'mmv' in
>> /proc/systemtap/{MODULE_NAME}. I've just been using pcp's 'mmvdump'
>> utility to dump the contents of the /proc/systemtap/{MODULE_NAME}/mmv
>> file. Currently the pcp mmv pmda only looks in one place for mmv files,
>> but it might be possible to create a symbolic link to systemtap's mmv
>> file to make it happy.
> 
> (OOC, what's {MODULE_NAME} in this context?)

As Frank mentioned, {MODULE_NAME} is the name of the module, usually
autogenerated.

> A symlink would sorta work but it feels like a bit of a workaround - the
> PMDA is written to be able to detect arrival/departure of new MMV files
> based on changes in a directory (and the location of that directory is
> parameterised via /etc/pcp.conf variables).  I'd not recommend trying to
> find it within a stap script ... I imagine it will be cleaner if we go
> for separate user/kernel locations for MMV files.

The symlink feels like a bit of a workaround, because it *is* one. Long
term we certainly want a better way to find the systemtap MMV files.

> Attached patch (lightly tested) implements the PCP side of things with
> that in mind - with this patch (and making the kernel code manage the
> lifecycle of separate /proc/mmv/* entries), things should begin to work
> out-of-the-box (the MMV PMDA is already default-enabled in the default
> pmcd config file, so everything else is in place for you).

This code sounds like a step in the right direction.

Thanks.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-22 14:25       ` David Smith
@ 2014-07-22 15:21         ` Frank Ch. Eigler
  2014-07-22 21:02           ` David Smith
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Ch. Eigler @ 2014-07-22 15:21 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp

Hi, David -


> [...]
> I think a JSON fetcher/parser is a good idea.

Good enough to redirect from MMV?


> [...]
> > To spell it out, the idea was to encode the mmv format logic
> > (including metadata management) within a stap tapset script instead of
> > as C in the runtime.  Then the C runtime would need to do nothing but
> > provide a memory-mapped-byte-array kind of abstraction, and a way for
> > the script code to read/write it (maybe a variant of
> > sprintf("%b...")?).
> 
> Hmm. I think I see where you are going with this, but I don't know if it
> will work well for a couple of reasons. One is that you don't just write
> to this array, you have to be able to read it back (in order to shuffle
> things around, hook up instances to indoms, etc.). 

(Not if the stap tapset just rewrites the whole metadata if necessary.)

> In addition, the best way to ensure that a client can read the data
> produced by the mmv stuff is to use the same header file so we know
> the data is laid out the same way.

Right ... though since MMV is a fixed ABI, a completely separate
implementation would help test the robustness of its specification.


- FChE

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-22 15:21         ` Frank Ch. Eigler
@ 2014-07-22 21:02           ` David Smith
  0 siblings, 0 replies; 16+ messages in thread
From: David Smith @ 2014-07-22 21:02 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Systemtap List, pcp

On 07/22/2014 10:21 AM, Frank Ch. Eigler wrote:
> Hi, David -
> 
> 
>> [...]
>> I think a JSON fetcher/parser is a good idea.
> 
> Good enough to redirect from MMV?

I think MMV and a JSON fetcher/parser aren't mutually exclusive. I'd say
they are complementary.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pcp] systemtap/pcp integration
  2014-07-22 14:50   ` [pcp] " David Smith
@ 2014-07-23 10:29     ` Nathan Scott
  2014-07-23 14:42       ` Frank Ch. Eigler
  0 siblings, 1 reply; 16+ messages in thread
From: Nathan Scott @ 2014-07-23 10:29 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp

Hey David,

----- Original Message -----
> On 07/21/2014 08:32 PM, Nathan Scott wrote:
> > Hi David,
> > 
> > ----- Original Message -----
> >> [...]
> >> Note that systemtap will create a file called 'mmv' in
> >> /proc/systemtap/{MODULE_NAME}. I've just been using pcp's 'mmvdump'
> >> utility to dump the contents of the /proc/systemtap/{MODULE_NAME}/mmv
> >> file. Currently the pcp mmv pmda only looks in one place for mmv files,
> >> but it might be possible to create a symbolic link to systemtap's mmv
> >> file to make it happy.
> > 
> > (OOC, what's {MODULE_NAME} in this context?)
> 
> As Frank mentioned, {MODULE_NAME} is the name of the module, usually
> autogenerated.

Got it.  So, next I'm wondering... what is it doing here, in this interface
between systemtap/pcp?  (interfaces being "forever" and all that, the simpler
the better).  From a PCP POV it (appears to) add no value, so would seem to
be something we can remove/simplify?  OTOH maybe not -maybe you can use it as
follows:

In MMV (and the existing pmdammv, in particular), the basename of these files
is used to form the first component of the metric namespace - that is, unless
the caller (i.e. you, in your kernel code) sets MMV_FLAG_NOPREFIX.  In that
case, its ignored  (see also mmv_stats_init(3), "name" parameter).

So, perhaps for the stap case you could always set that flag, and use those
module names as the <xxx> in a /proc/mmv/<xxx> -alike scheme.  That'd be a
fairly simple mechanism that'd work directly with the patch I sent earlier,
I think (they'd need to be files, not directories though - else some slightly
more complex code is going to be needed).

> > A symlink would sorta work but it feels like a bit of a workaround - the
> > PMDA is written to be able to detect arrival/departure of new MMV files
> > based on changes in a directory (and the location of that directory is
> > parameterised via /etc/pcp.conf variables).  I'd not recommend trying to
> > find it within a stap script ... I imagine it will be cleaner if we go
> > for separate user/kernel locations for MMV files.
> 
> The symlink feels like a bit of a workaround, because it *is* one. Long

:)

> term we certainly want a better way to find the systemtap MMV files.
> 
> > Attached patch (lightly tested) implements the PCP side of things with
> > that in mind - with this patch (and making the kernel code manage the
> > lifecycle of separate /proc/mmv/* entries), things should begin to work
> > out-of-the-box (the MMV PMDA is already default-enabled in the default
> > pmcd config file, so everything else is in place for you).
> 
> This code sounds like a step in the right direction.

Well, hopefully something to experiment with anyway; it might get you close
to a complete end-to-end working scenario.

> [...] 
> I think MMV and a JSON fetcher/parser aren't mutually exclusive. I'd say
> they are complementary.

Using memory-mappings to share between user/kernel (or user/user as existing
PCP MMV code does) requires fixed locations in the mapping - we're accessing
it via pointers on both sides of the fence.  Entirely-string representations
(which is what I guess the intention with a JSON interface would be?) would
lose that property, since the strings (and the values in particular) do not
have fixed offsets within the mapping file anymore.

I guess you'd have to then completely start over again (for ... reasons?) and
implement it more like a traditional /proc interface (via seq_file), where
userspace sampling is via open/read/close, and not the mapped memory model
where the kernel is actively writing while userspace is actively reading.

I haven't seen any other need for a generic JSON interface, so can't really
comment as to its suitability for this kind of thing.  There is one existing
JSON PMDA - the elasticsearch PMDA uses JSON encodings fairly extensively -
but it's very specific & not a generic representation like you're after here.
Maybe you can use it to help gauge levels of complexity to expect though.

cheers.

--
Nathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-23 10:29     ` Nathan Scott
@ 2014-07-23 14:42       ` Frank Ch. Eigler
  2014-07-24  7:40         ` Nathan Scott
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Ch. Eigler @ 2014-07-23 14:42 UTC (permalink / raw)
  To: Nathan Scott; +Cc: David Smith, pcp, Systemtap List


nathans wrote:

> Got it.  So, next I'm wondering... what is [the module name] doing
> here, in this interface between systemtap/pcp?  [...]

systemtap is not a single bundle of instrumentation.  It is a tool for
building/running many instrumentation scripts.  The unique module name
lets the kernel tell them apart.


> In MMV (and the existing pmdammv, in particular), the basename of
> these files is used to form the first component of the metric
> namespace [...]

That's a most straightforward possibility, though not the most
user-friendly one, as the module names change from run to run.
Perhaps the systemtap script could propose a pmns prefix via an
auxiliary file.


> [...]  I guess you'd have to then completely start over again [for
> JSON]

Yes, but the systemtap side of this would be pretty trivial.  Printing
JSON strings is easy.


>  (for ... reasons?) 

Other messages in the thread pointed out some reasons, namely:

- ability to generalize to event traffic, not just sampled metrics
- robustness, by avoiding fragile C code


> [...]  I haven't seen any other need for a generic JSON interface
> [...]

In previous notes it was pointed out that JSON is a good encoding for
stats data, because it is exported by other tools like ceph, zabbix,
and many others including

> [...] elasticsearch [...]

So the proposal is to think about a single general JSON PMDA that can
be configured to bridge data from multiple JSON-emitting applications,
including systemtap scripts.


- FChE

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: systemtap/pcp integration
  2014-07-23 14:42       ` Frank Ch. Eigler
@ 2014-07-24  7:40         ` Nathan Scott
  0 siblings, 0 replies; 16+ messages in thread
From: Nathan Scott @ 2014-07-24  7:40 UTC (permalink / raw)
  To: Frank Ch. Eigler, David Smith; +Cc: pcp, Systemtap List



----- Original Message -----
> [...]
> Yes, but the systemtap side of this would be pretty trivial.  Printing
> JSON strings is easy.

Thats only a small part of the problem though (even just considering
the systemtap side of things), and there are many advantages to the MMV
format being so close to the representation the PMDA needs to export.

> >  (for ... reasons?)
> 
> Other messages in the thread pointed out some reasons

Those references were a little too vague for me to follow the intent.

> - ability to generalize to event traffic, not just sampled metrics

That's not really a reason to start over though - the MMV format is
versioned, it could easily be extended and all its many advantages
kept.  Or, a separate PMDA might make a better option there - there
is no reason one PMDA needs to service both these (very different)
tracing/sampling data models.

Anyway, do keep in mind the oft-mentioned cost of string conversions
(Ken mentioned on the last call, IIRC, David mentioned his concerns
originally, and I will say it again too) - every time we've profiled
pmdalinux, the CPU cost is always dominated by the string extraction/
conversion code for the procfs/sysfs files.

That will be orders of magnitude worse for event tracing, I imagine.

> - robustness, by avoiding fragile C code

Do you mean Davids systemtap code?  The MMV code in PCP has been around
for quite some time, and I know of a number of production environments
using it 24x7 on hundreds of machines, for many years - its not fragile
at all.  Stability in Davids code will come over time, if there really
is an issue there (?) but experience would suggest there is no inherent
flaw in the approach he's using.

> So the proposal is to think about a single general JSON PMDA that can
> be configured to bridge data from multiple JSON-emitting applications,
> including systemtap scripts.

Bridges like this have been implemented several times in the past and it
has never worked well.  The fundamental, unsolvable-in-the-general-case
problem is that we have well-defined metric semantics in PCP.  So these
bridges have to (attempt to) map from the usually-far-less-so,if-defined
-at-all semantics offered by the other tool, and come up with the highly
detailed PCP metric metadata, somehow.  To date custom PMDAs have always
done a better job - more simply-coded, and with domain-specific-code, as
designed for PCP PMDAs (we can still use JSON there of course, as in the
elasticsearch case).


FWIW, I chatted to someone last week who was actually using Zabbix in a
production environment - they were interested in exploring the opposite
direction to what you've described above.  IOW, they would like to have
a PCP component that talks Zabbix protocol (JSON-alike, IIRC) such that
they wouldn't have to install the Zabbix agent code everywhere.  They'd
like instead to rely on a PCP component (could pmwebd be taught to fill
this role, perhaps? dunno) giving them everything they need for all the
many machines they rollout, cookie-cutter style, from PCP.  Then, their
central Zabbix instance could continue doing their alerting, which they
like.  It sounded a little bit like the graphana/graphite exporting, and
I believe its also JSON-based, so just thought I'd mention it here.

cheers.

--
Nathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-07-24  7:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-17 21:14 systemtap/pcp integration David Smith
2014-07-18 15:49 ` Frank Ch. Eigler
2014-07-18 17:02   ` David Smith
2014-07-18 18:27     ` Frank Ch. Eigler
2014-07-22 14:25       ` David Smith
2014-07-22 15:21         ` Frank Ch. Eigler
2014-07-22 21:02           ` David Smith
2014-07-18 21:10 ` [pcp] " William Cohen
2014-07-21 15:43   ` David Smith
2014-07-21 15:54     ` William Cohen
2014-07-22  1:32 ` Nathan Scott
2014-07-22  2:57   ` Frank Ch. Eigler
2014-07-22 14:50   ` [pcp] " David Smith
2014-07-23 10:29     ` Nathan Scott
2014-07-23 14:42       ` Frank Ch. Eigler
2014-07-24  7:40         ` Nathan Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).