systemtap/pcp integration pmda 0.1

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* systemtap/pcp integration pmda 0.1
@ 2014-09-12 18:37 David Smith
  2014-09-22 22:05 ` Frank Ch. Eigler
  0 siblings, 1 reply; 4+ messages in thread
From: David Smith @ 2014-09-12 18:37 UTC (permalink / raw)
  To: Systemtap List, pcp

[-- Attachment #1: Type: text/plain, Size: 2601 bytes --]

Here's version 0.1 (up from 0.01!) of my systemtap/pcp integration work
that uses systemtap (https://sourceware.org/systemtap/) to export JSON
data and a pcp (http://www.performancecopilot.org/) python pmda that
reads and processes the JSON data.

At this point things work reasonably well (at least with the test
systemtap script I've attached). There are still lots of "FIXME"
comments spread throughout the code. This time the systemtap side of
things has come along and you no longer have to hand-write JSON to
output data. The script itself is based on some work that Will
Cohen has been doing to measure network latency. Note that the script
does output live data.

The biggest addition this time is array handling, which was a bit
tricky. I'm still not sure the indom handling on the pcp side is correct
(but it seems to follow the other python pmdas).

Here's what live data looks like from pcp:

====
# pminfo -df stap_json

stap_json.json.dummy2
    Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
    Semantics: instant  Units: none
    value "dummy2"

stap_json.json.dummy_array.dummy2
    Data Type: string  InDom: 130.1 0x20800001
    Semantics: instant  Units: none
    inst [0 or "1"] value "def"
    inst [1 or "0"] value "abc"
    inst [2 or "2"] value "ghi"

stap_json.json.dummy_array.dummy1
    Data Type: 64-bit int  InDom: 130.1 0x20800001
    Semantics: counter  Units: none
    inst [0 or "1"] value 2
    inst [1 or "0"] value 1
    inst [2 or "2"] value 3

stap_json.json.net_xmit_data.xmit_latency
    Data Type: 64-bit int  InDom: 130.0 0x20800000
    Semantics: counter  Units: none
    inst [0 or "fake1"] value 0
    inst [1 or "fake2"] value 0
    inst [2 or "eth0"] value 319

stap_json.json.net_xmit_data.xmit_count
    Data Type: 64-bit int  InDom: 130.0 0x20800000
    Semantics: counter  Units: none
    inst [0 or "fake1"] value 0
    inst [1 or "fake2"] value 0
    inst [2 or "eth0"] value 2304551

stap_json.json.read_count
    Data Type: 64-bit int  InDom: PM_INDOM_NULL 0xffffffff
    Semantics: counter  Units: none
    value 8

stap_json.json.xstring
    Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
    Semantics: instant  Units: none
    value "testing, 1, 2, 3"
====

The pcp pmda still only supports one systemtap script at this point. I
ran the attached systemtap script using the following command line to
produce the output above.

# stap -m json -v ./net_xmit_json5.stp eth0 fake1 fake2

Any comments/feedback would be appreciated.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

[-- Attachment #2: net_xmit_json5.stp --]
[-- Type: text/plain, Size: 11553 bytes --]

// This script tracks time between packet queue and xmit.
// The information is provided to userspace via procfs and are read
// using the stap_json PCP PMDA.

// ===========================
// ==== Tapset file start ====
// ===========================

global __json_metric_type, __json_metric_desc
global __json_array_metric_type, __json_array_metric_desc
global __json_metric_output, __json_array_output

function json_add_numeric_metric:long(name:string, description:string)
{
  if (name in __json_metric_type)
    error(sprintf("Metric '%s' already exists", name))
  __json_metric_type[name] = "integer"
  __json_metric_desc[name] = description
  return 0
}

function json_add_string_metric:long(name:string, description:string)
{
  if (name in __json_metric_type)
    error(sprintf("Metric '%s' already exists", name))
  __json_metric_type[name] = "string"
  __json_metric_desc[name] = description
  return 0
}

function json_add_array:long(name:string, description:string)
{
  if (name in __json_metric_type)
    error(sprintf("Metric '%s' already exists", name))
  __json_metric_type[name] = "array"
  __json_metric_desc[name] = description

  # Go ahead and add "__id", which is the array index.
  json_add_array_string_metric(name, "__id", "")
  return 0
}

function json_add_array_numeric_metric:long(array_name:string,
	metric_name:string, metric_description:string)
{
  if ([array_name, metric_name] in __json_array_metric_type)
    error(sprintf("Array metric '%s' already exists in array %s", metric_name,
		  array_name))
  __json_array_metric_type[array_name, metric_name] = "integer"
  __json_array_metric_desc[array_name, metric_name] = metric_description
  return 0
}

function json_add_array_string_metric:long(array_name:string,
	metric_name:string, metric_description:string)
{
  if ([array_name, metric_name] in __json_array_metric_type)
    error(sprintf("Array metric '%s' already exists in array %s", metric_name,
		  array_name))
  __json_array_metric_type[array_name, metric_name] = "string"
  __json_array_metric_desc[array_name, metric_name] = metric_description
  return 0
}

@define __json_output_metric(indent_str, name, type, description)
%(
  $value .= sprintf("%s\"%s\": {\n", @indent_str, @name)
  $value .= sprintf("%s  \"type\": \"%s\",\n", @indent_str, @type)
  if (strlen(@description) > 0)
    $value .= sprintf("%s  \"description\": \"%s\",\n", @indent_str,
		      @description)
  $value .= sprintf("%s  \"additionalProperties\": false\n%s}", @indent_str,
		    @indent_str)
%)

@define __json_output_array(indent_str, name, description)
%(
  $value .= sprintf("%s\"%s\": {\n", @indent_str, @name)
  $value .= sprintf("%s  \"type\": \"array\",\n", @indent_str)
  if (strlen(@description) > 0)
    $value .= sprintf("%s  \"description\": \"%s\",\n", @indent_str,
		      @description)
  $value .= sprintf("%s  \"additionalProperties\": false,\n", @indent_str)
  $value .= sprintf("%s  \"items\": {\n", @indent_str)
  $value .= sprintf("%s    \"type\": \"object\",\n", @indent_str)
  $value .= sprintf("%s    \"additionalProperties\": false,\n",
  		     @indent_str)
  $value .= sprintf("%s    \"properties\": {\n", @indent_str)
  __array_comma_needed = 0
  foreach ([__array_name, __metric_name] in __json_array_metric_type) {
    if (@name == __array_name) {
      if (__array_comma_needed)
        $value .= ",\n"
      __array_comma_needed = 1    

      __subindent_str = @indent_str . "      "
      __subtype = __json_array_metric_type[__array_name, __metric_name]
      __subdesc = __json_array_metric_desc[__array_name, __metric_name]
      @__json_output_metric(__subindent_str, __metric_name, __subtype,
      			    __subdesc)
    }
  }
  $value .= sprintf("\n%s    }\n", @indent_str)
  $value .= sprintf("%s  }\n", @indent_str)
  $value .= sprintf("%s}", @indent_str)
%)

@define json_output_schema
%(
  # Note: This is the "pretty-printed" version of the schema, intended
  # to be read by humans. We could remove the whitespace and newlines
  # if we wanted to make the output shorter (but less readable).
  #
  # Note 2: Note that we have to break this long string into more than
  # 1 assignment since we're bumping up against MAXSTRINGLEN. Procfs
  # $value can hold more than MAXSTRINGLEN because of the
  # '.maxsize(N)' parameter.
  $value =
    "{\n"
    "  \"type\": \"object\",\n"
    "  \"title\": \"root\",\n"
    "  \"additionalProperties\": false,\n"
    "  \"properties\": {\n"
    "    \"generation\": {\n"
    "      \"type\": \"integer\",\n"
    "      \"additionalProperties\": false\n"
    "    },\n"
  $value .=
    "    \"data\": {\n"
    "      \"type\": \"object\",\n"
    "      \"additionalProperties\": false,\n"
    "      \"properties\": {\n"

  __comma_needed = 0
  foreach (__name in __json_metric_type) {
    if (__comma_needed)
      $value .= ",\n"
    __comma_needed = 1    

    if (__json_metric_type[__name] != "array") {
      @__json_output_metric("        ", __name, __json_metric_type[__name],
			    __json_metric_desc[__name])
    }
    else {
      @__json_output_array("        ", __name, __json_metric_desc[__name])
    }
  }

  $value .=
    "\n"
    "      },\n"
    "      \"required\": [\n"
  __comma_needed = 0
  foreach (__name in __json_metric_type) {
    if (__comma_needed)
      $value .= ",\n"
    __comma_needed = 1    
    $value .= sprintf("        \"%s\"", __name)
  }
  $value .=
    "\n"
    "      ]\n"
    "    }\n"
    "  }\n"
    "}\n"
%)

# NOTE: This is the "pretty-printed" version of the data, intended
# to be read by humans. We could remove the whitespace and newlines
# if we wanted to make the output shorter (but less readable).
@define json_output_data_start
%(
  __comma_needed = 0
  $value =
    "{\n"
    "  \"generation\": 1,\n"
    "  \"data\": {\n"
%)

# Make sure we don't try to output the same metric twice in the same
# data fetch.
@define __json_output_check(name)
%(
  if (@name in __json_metric_output)
    error(sprintf("Metric '%s' already output", @name))
  __json_metric_output[@name] = 1
%)

# Make sure we don't try to output the same array index twice in the same
# data fetch.
@define __json_output_array_check(array_index)
%(
  if (@array_index in __json_array_output)
    error(sprintf("Array index '%s' already output for array metric %s",
		  @array_index, __json_array_started))
  __json_array_output[@array_index] = 1
%)

# Output a string value.
@define json_output_string_value(name, value)
%(
  @__json_output_check(@name)
  @__json_output_array_end
  if (__comma_needed)
    $value .= ",\n"
  __comma_needed = 1    
  $value .= sprintf("    \"%s\": \"%s\"", @name, @value)
%)

# Output a numeric value.
@define json_output_numeric_value(name, value)
%(
  @__json_output_check(@name)
  @__json_output_array_end
  if (__comma_needed)
    $value .= ",\n"
  __comma_needed = 1    
  $value .= sprintf("    \"%s\": %d", @name, @value)
%)

# Output a string value for an array.
@define json_output_array_string_value(array_name, array_index, metric_name, value)
%(
  @__json_output_array_start(@array_name, @array_index)
  if (__comma_needed)
    $value .= ",\n"
  __comma_needed = 1    
  $value .= sprintf("        \"%s\": \"%s\"", @metric_name, @value)
%)

# Output a numeric value for an array.
@define json_output_array_numeric_value(array_name, array_index, metric_name,
					value)
%(
  @__json_output_array_start(@array_name, @array_index)
  if (__comma_needed)
    $value .= ",\n"
  __comma_needed = 1    
  $value .= sprintf("        \"%s\": %d", @metric_name, @value)
%)

# Handle the details of starting the output of an array.
@define __json_output_array_start(array_name, array_index)
%(
  if (__json_array_started != @array_name) {
    @__json_output_check(@array_name)
    @__json_output_array_end
    if (__comma_needed)
      $value .= ",\n"
    __comma_needed = 1    
    $value .= sprintf("    \"%s\": [\n", @array_name)
    $value .= "      {\n"
    __json_array_started = @array_name
  }
  if (__json_array_index_started != @array_index) {
    @__json_output_array_check(@array_index)
    if (__json_array_index_started != "") {
      $value .=
        "\n"
	"      },\n"
	"      {\n"
    }
    __json_array_index_started = @array_index
    $value .= sprintf("        \"__id\": \"%s\"", @array_index)
  }
%)

# Handle the details of finishing the output of an array.
@define __json_output_array_end
%(
  if (__json_array_started != "") {
    $value .=
      "\n"
      "      }\n"
      "    ]"
    __json_array_started = ""
    __json_array_index_started = ""
    delete __json_array_output
  }
%)

# Finish outputting data.
@define json_output_data_end
%(
  @__json_output_array_end
  $value .=
    "\n"
    "  }\n"
    "}\n"
  __comma_needed = 0
  delete __json_metric_output
%)

probe procfs("schema").read.maxsize(8192)
{
  @json_output_schema
}

probe json_data = procfs("data").read.maxsize(8192)
{
}

// =========================
// ==== Tapset file end ====
// =========================


global net_devices
global read_count

probe json_data
{
  @json_output_data_start
  @json_output_string_value("xstring", "testing, 1, 2, 3")
  @json_output_numeric_value("read_count", read_count)
  read_count++

  foreach (dev in net_devices) {
    if (@count(skb_queue_t[dev])) {
      @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count",
				       @sum(skb_queue_t[dev]))
      @json_output_array_numeric_value("net_xmit_data", dev, "xmit_latency",
				       @count(skb_queue_t[dev]))
    }
    else {
      @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 0)
      @json_output_array_numeric_value("net_xmit_data", dev, "xmit_latency", 0)
    }
  }

  # Add dummy values just to test the interface
  @json_output_array_numeric_value("dummy_array", "0", "dummy1", 1)
  @json_output_array_string_value("dummy_array", "0", "dummy2", "abc")
  @json_output_array_numeric_value("dummy_array", "1", "dummy1", 2)
  @json_output_array_string_value("dummy_array", "1", "dummy2", "def")
  @json_output_array_numeric_value("dummy_array", "2", "dummy1", 3)
  @json_output_array_string_value("dummy_array", "2", "dummy2", "ghi")
  @json_output_string_value("dummy2", "dummy2")
  @json_output_data_end
}

// Set up the metrics
probe begin
{
  // fallback instance device "eth0" if none specified
  if (argc == 0) {
    argv[1] = "eth0"
  }
  
  // remember all the network devices
  for (i = 1; i <= argc; i++) {
    dev = argv[i]
    net_devices[dev] = i - 1
  }

  // Add the metrics
  json_add_string_metric("xstring", "Test string")
  json_add_numeric_metric("read_count", "Times values read")

  json_add_array("net_xmit_data", "Network transmit data indexed by ethernet device")
  json_add_array_numeric_metric("net_xmit_data", "xmit_count", "number of packets for xmit device")
  json_add_array_numeric_metric("net_xmit_data", "xmit_latency", "sum of latency for xmit device")

  // Add some dummy metrics just to test the tapset.
  json_add_array("dummy_array", "")
  json_add_array_numeric_metric("dummy_array", "dummy1", "")
  json_add_array_string_metric("dummy_array", "dummy2", "")
  json_add_string_metric("dummy2", "Test string")
}

// probes to track the information

global skb_queue_start_t, skb_queue_t

probe kernel.trace("net_dev_queue") {
  skb_queue_start_t[$skb] = gettimeofday_ns();
}

probe kernel.trace("net_dev_start_xmit"), kernel.trace("net_dev_xmit") {
  t = gettimeofday_ns();
  st = skb_queue_start_t[$skb]
  if (st){
    skb_queue_t[kernel_string($dev->name)] <<< t - st
    delete skb_queue_start_t[$skb]
  }
}

[-- Attachment #3: pmdastap_json.python --]
[-- Type: text/plain, Size: 15288 bytes --]

#!/usr/bin/python
import json
import jsonschema
import collections
from pcp.pmda import PMDA, pmdaMetric, pmdaIndom, pmdaInstid
import cpmapi as c_api
from pcp.pmapi import pmUnits, pmContext as PCP
from ctypes import c_int, POINTER, cast

class Metric(object):
    def __init__(self, name):
        self.name = name
        self.desc = ''
        self.type = c_api.PM_TYPE_UNKNOWN
        self.sem = c_api.PM_SEM_INSTANT
        self.pmid = None
        self.obj = None
        self.indom = c_api.PM_INDOM_NULL

class Indom(object):
    def __init__(self):
        self.obj = None
        self.values = {}

    def add_value(self, name, value):
        # PMDA.replace_indom() wants a dictionary, indexed by
        # indom string value. PMDA.replace_indom() doesn't really
        # care what is stored at that string value. We're storing the
        # array index there.
        self.values[name] = c_int(value)

class STAP_JSON_PMDA(PMDA):
    def __init__(self, pmda_name, domain):
        self.pmda_name = pmda_name
        PMDA.__init__(self, self.pmda_name, domain)

        # Load the schema and data.
        self.metrics = {}
        self.load_json_schema()
        self.load_json_data()

        # Make sure the data fits the schema.
        jsonschema.validate(self.json_data, self.schema)

        # Update the indom list.
        self.indoms = {}
        self.refresh()

        # Parse the schema header, looking for the 'root' name of the
        # data (all metrics get created under this name) and create
        # the metrics as needed.
        #
        # FIXME: later this will be the module name. For now, hardcode
        # it to 'json'.
        self.root_name = "json"
        self._parse_schema()

        self.set_fetch(self._fetch)
        self.set_fetch_callback(self._fetch_callback)
        self.set_store_callback(self._store_callback)

    def load_json_schema(self):
        # Load schema
        f = open("/proc/systemtap/json/schema")
        try:
            self.schema = json.load(f, object_pairs_hook=collections.OrderedDict)
        except:
            self.schema = {}
        f.close()

    def load_json_data(self):
        # Load data
        f = open("/proc/systemtap/json/data")
        try :
            self.json_data = json.load(f,
                                       object_pairs_hook=collections.OrderedDict)
        except:
            self.json_data = {}
        f.close()

    def refresh(self):
        # Notice we never delete indoms, we just keep adding.
        for array_name in self.indoms.keys():
            index = 0
            try:
                # json_data['data'][array_name] is a list of
                # dictionaries.
                for item in self.json_data['data'][array_name]:
                    self.indoms[array_name].add_value(item['__id'], index)
                    index += 1
            except:
                pass
            self.replace_indom(self.indoms[array_name].obj,
                               self.indoms[array_name].values)

    def _add_metric(self, metric_info):
        metric_info.pmid = self.pmid(0, self.metric_idx)
        # FIXME: we'll need to handle units/scale at some point...
        metric_info.obj = pmdaMetric(metric_info.pmid, metric_info.type,
                                     metric_info.indom, metric_info.sem,
                                     pmUnits(0, 0, 0, 0, 0, 0))

        self.add_metric("%s.%s.%s" % (self.pmda_name, self.root_name,
                                           metric_info.name),
                        metric_info.obj, metric_info.desc)
        self.metrics[self.metric_idx] = metric_info
        self.metric_idx += 1

    def _parse_array_schema(self, array_name, properties):
        # First process the array schema "header" information.
        array_properties = None
        for (key, value) in properties.items():
            # 'type' (required): Sanity check it.
            if key == 'type':
                if not isinstance(value, unicode):
                    raise TypeError
                    if value != 'object':
                        raise TypeError, \
                            ("Type attribute has unknown value '%s'" % value)
            # 'additionalProperties' (optional): Ignore it.
            elif key == "additionalProperties":
                # Do nothing.
                pass
            # 'properties' (required): Type check it and save for later.
            elif key == "properties":
                if not isinstance(value, dict):
                    raise TypeError
                array_properties = value
            # For everything else, raise an error.
            else:
                raise RuntimeError, "Unknown attribute '%s'" % key
        if not array_properties:
            raise RuntimeError, "Schema has no 'properties' attribute"

        if not self.indoms.has_key(array_name):
            # Note that we're creating an indom here, but we don't
            # know any values for it yet. We'll get those on a data
            # read.
            self.indoms[array_name] = Indom()
            self.indoms[array_name].obj = self.indom(self.indom_idx)
            self.indom_idx += 1

        # If we're here, we know the array "header" was
        # reasonable. Now process "properties", which is the real data
        # description.
        for (name, attributes) in array_properties.items():
            metric_info = Metric("%s.%s" % (array_name, name))
            metric_info.indom = self.indoms[array_name].obj

            for (key, value) in attributes.items():
                # 'type' (required): Sanity check it and save it.
                if key == 'type':
                    if not isinstance(value, unicode):
                        raise TypeError
                    if value == 'string':
                        metric_info.type = c_api.PM_TYPE_STRING
                        metric_info.sem = c_api.PM_SEM_INSTANT
                    elif value == 'integer':
                        metric_info.type = c_api.PM_TYPE_64
                        metric_info.sem = c_api.PM_SEM_COUNTER
                    else:
                        raise TypeError, \
                            ("Type attribute has unknown value '%s'" % value)
                # 'description' (optional): Type check it and save it.
        	elif key == 'description':
                    if not isinstance(value, unicode):
                        raise TypeError
                    metric_info.desc = value
                # 'additionalProperties' (optional): Ignore it.
            	elif key == "additionalProperties":
                    # Do nothing.
                    pass
                # For everything else, raise an error.
            	else:
                    raise RuntimeError, \
                        ("Schema for '%s' has an unknown attribute '%s'"
                         % (name, key)) 
                
            # Make sure we have everything we need.
            if metric_info.type == c_api.PM_TYPE_UNKNOWN:
                raise RuntimeError, ("Schema for '%s' has no 'type' attribute"
                                     % name)

            # Add the metric (if it isn't our special '__id' metric).
            if name != '__id':
                self._add_metric(metric_info)

    def _parse_schema(self):
        '''
        Go through the schema, looking for information we can use to create
        the pcp representation of the schema. Note that we don't support
        every possible JSON schema, we're looking for certain items.

        Refer to the following link for details of JSON schemas:

        <http://tools.ietf.org/html/draft-zyp-json-schema-03>
        '''

        # First process the schema "header" information.
        data_header = None
        for (key, value) in self.schema.items():
            # 'type' (required): Just sanity check it.
            if key == "type":
                if not isinstance(value, unicode) or value != "object":
                    raise TypeError
            # 'title' (optional): Type check it.
            elif key == "title":
                if not isinstance(value, unicode):
                    raise TypeError
            # 'description' (optional): Type check it.
            elif key == "description":
                if not isinstance(value, unicode):
                    raise TypeError
            # 'additionalProperties' (optional): Ignore it.
            elif key == "additionalProperties":
                # Do nothing.
                pass
            # 'properties' (required): Type check it and save for later.
            elif key == "properties":
                if not isinstance(value, dict):
                    raise TypeError
                data_header = value
            # For everything else, raise an error.
            else:
                raise RuntimeError, "Unknown attribute '%s'" % key
        
        # Pick the right field for the root of the namespace - prefer
        # "title" over "description".
        #if self.schema.has_key("title"):
        #    self.root_name = self.schema["title"]
        #elif self.schema.has_key("description"):
        #    self.root_name = self.schema["description"]
        #else:
        #    raise RuntimeError, "No 'title' or 'description' field in schema header"
    
        # If we're here, we know the "header" was reasonable. Now process
        # "properties", which is the data "header".
        if not data_header:
            raise RuntimeError, "Schema has no 'properties' attribute"
        data_properties = None
        for (key, value) in data_header.items():
            # 'generation' (required): Just sanity check it.
            if key == "generation":
                if not isinstance(value, dict):
                    raise TypeError
            # 'data' (required): Type check it.
            elif key == "data":
                if not isinstance(value, dict) \
                   or not value.has_key("properties") \
                   or not isinstance(value["properties"], dict):
                    raise TypeError
                data_properties = value["properties"]
            # For everything else, raise an error.
            else:
                raise RuntimeError, "Unknown attribute '%s'" % key
        
        # If we're here, we know the data "header" was reasonable. Now process
        # "properties.data.properties", which is the real data description.
        if not data_properties:
            raise RuntimeError, "Schema has no 'properties.data.properties' attribute"
        self.metric_idx = 0
        self.indom_idx = 0
        for (name, attributes) in data_properties.items():
            metric_info = Metric(name)
            for (key, value) in attributes.items():
                # 'type' (required): Sanity check it and save it.
                if key == 'type':
                    if not isinstance(value, unicode):
                        raise TypeError
                    if value == 'string':
                        metric_info.type = c_api.PM_TYPE_STRING
                        metric_info.sem = c_api.PM_SEM_INSTANT
                    elif value == 'integer':
                        metric_info.type = c_api.PM_TYPE_64
                        metric_info.sem = c_api.PM_SEM_COUNTER
                    elif value == 'array':
                        # For arrays, we have to create metrics for
                        # each subitem in the array, using the same
                        # indom. This happens in the 'items' handling
                        # below.
                        metric_info.type = c_api.PM_TYPE_NOSUPPORT
                    else:
                        raise TypeError, \
                            ("Type attribute has unknown value '%s'" % value)
                # 'description' (optional): Type check it and save it.
        	elif key == 'description':
                    if not isinstance(value, unicode):
                        raise TypeError
                    metric_info.desc = value
                # 'additionalProperties' (optional): Ignore it.
            	elif key == "additionalProperties":
                    # Do nothing.
                    pass
                # 'default' (optional): Ignore it (for now).
            	elif key == "default":
                    # Do nothing for now.
                    pass
                elif key == "items":
                    if metric_info.type != c_api.PM_TYPE_NOSUPPORT: 
                        raise RuntimeError, \
                            ("Schema has an 'items' item for non-array '%s'"
                             % name)

                    # If we're here, we're processing an array's
                    # schema. For arrays, we have to create metrics for
                    # each subitem in the array, using the same
                    # indom.
                    self._parse_array_schema(name, value)
                # For everything else, raise an error.
            	else:
                    raise RuntimeError, \
                        ("Schema for '%s' has an unknown attribute '%s'"
                         % (name, key)) 

            # Make sure we have everything we need.
            if metric_info.type == c_api.PM_TYPE_UNKNOWN:
                raise RuntimeError, ("Schema for '%s' has no 'type' attribute"
                                     % name)

            # Add the metric.
            if metric_info.type != c_api.PM_TYPE_NOSUPPORT:
                self._add_metric(metric_info)

    def _fetch(self):
        ''' Called once per "fetch" PDU, before callbacks '''
        self.load_json_data()
        self.refresh()

    def _fetch_callback(self, cluster, item, inst):
        '''
        Main fetch callback. Returns a list of value,status (single
        pair) for requested pmid/inst.
        '''
        if cluster != 0:
            return [c_api.PM_ERR_PMID, 0]
        try:
            metric_info = self.metrics[item]
        except:
            return [c_api.PM_ERR_PMID, 0]

        # Handle array metrics.
        if metric_info.indom != c_api.PM_INDOM_NULL:
            # Get the array index from the indom.
            voidp = self.inst_lookup(metric_info.indom, inst)
            if voidp == None:
                return [c_api.PM_ERR_INST, 0]
            array_indexp = cast(voidp, POINTER(c_int))
            array_index = array_indexp.contents.value

            # Split the full name into the array name and metric
            (array, metric) = metric_info.name.split('.', 2)
            try:
                return [self.json_data['data'][array][array_index][metric], 1]
            except:
                pass
        # Handle single-valued metrics.
        else:
            try:
                return [self.json_data['data'][metric_info.name], 1]
            except:
                pass
        return [c_api.PM_ERR_TYPE, 0]

    def _store_callback(self, cluster, item, inst, val):
        '''
        Store callback, executed when a request to write to a metric
        happens. Returns a single value.
        '''
        # Since we don't support storing values, always fail.
        return c_api.PM_ERR_PERMISSION

if __name__ == '__main__':
    STAP_JSON_PMDA('stap_json', 130).run()

[-- Attachment #4: data --]
[-- Type: text/plain, Size: 710 bytes --]

{
  "generation": 1,
  "data": {
    "xstring": "testing, 1, 2, 3",
    "read_count": 9,
    "net_xmit_data": [
      {
        "__id": "eth0",
        "xmit_count": 7699136,
        "xmit_latency": 1109
      },
      {
        "__id": "fake1",
        "xmit_count": 0,
        "xmit_latency": 0
      },
      {
        "__id": "fake2",
        "xmit_count": 0,
        "xmit_latency": 0
      }
    ],
    "dummy_array": [
      {
        "__id": "0",
        "dummy1": 1,
        "dummy2": "abc"
      },
      {
        "__id": "1",
        "dummy1": 2,
        "dummy2": "def"
      },
      {
        "__id": "2",
        "dummy1": 3,
        "dummy2": "ghi"
      }
    ],
    "dummy2": "dummy2"
  }
}

[-- Attachment #5: schema --]
[-- Type: text/plain, Size: 2321 bytes --]

{
  "type": "object",
  "title": "root",
  "additionalProperties": false,
  "properties": {
    "generation": {
      "type": "integer",
      "additionalProperties": false
    },
    "data": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "xstring": {
          "type": "string",
          "description": "Test string",
          "additionalProperties": false
        },
        "read_count": {
          "type": "integer",
          "description": "Times values read",
          "additionalProperties": false
        },
        "net_xmit_data": {
          "type": "array",
          "description": "Network transmit data indexed by ethernet device",
          "additionalProperties": false,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
              "__id": {
                "type": "string",
                "additionalProperties": false
              },
              "xmit_count": {
                "type": "integer",
                "description": "number of packets for xmit device",
                "additionalProperties": false
              },
              "xmit_latency": {
                "type": "integer",
                "description": "sum of latency for xmit device",
                "additionalProperties": false
              }
            }
          }
        },
        "dummy_array": {
          "type": "array",
          "additionalProperties": false,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
              "__id": {
                "type": "string",
                "additionalProperties": false
              },
              "dummy1": {
                "type": "integer",
                "additionalProperties": false
              },
              "dummy2": {
                "type": "string",
                "additionalProperties": false
              }
            }
          }
        },
        "dummy2": {
          "type": "string",
          "description": "Test string",
          "additionalProperties": false
        }
      },
      "required": [
        "xstring",
        "read_count",
        "net_xmit_data",
        "dummy_array",
        "dummy2"
      ]
    }
  }
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: systemtap/pcp integration pmda 0.1
  2014-09-12 18:37 systemtap/pcp integration pmda 0.1 David Smith
@ 2014-09-22 22:05 ` Frank Ch. Eigler
  2014-09-23 14:39   ` David Smith
  0 siblings, 1 reply; 4+ messages in thread
From: Frank Ch. Eigler @ 2014-09-22 22:05 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp

dsmith wrote:

> Here's version 0.1 (up from 0.01!) of my systemtap/pcp integration work

Thanks a lot!

> that uses systemtap (https://sourceware.org/systemtap/) to export JSON
> data and a pcp (http://www.performancecopilot.org/) python pmda that
> reads and processes the JSON data. [...]

(Sorry I missed this when it went by -- please call the next
version 3.14 or something else very different from 0.1! :-)

> # pminfo -df stap_json
>
> stap_json.json.dummy2
>     Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
>     Semantics: instant  Units: none
>     value "dummy2"
>
> stap_json.json.dummy_array.dummy2
>     Data Type: string  InDom: 130.1 0x20800001
>     Semantics: instant  Units: none
>     inst [0 or "1"] value "def"
>     inst [1 or "0"] value "abc"
>     inst [2 or "2"] value "ghi"
> [...]

Looking good!

> // ===========================
> // ==== Tapset file start ====
> // ===========================
>

Looks long but generally good.  (But see below re. suggestions about
the schema/data representations.)

> global net_devices
> global read_count
>
> probe json_data
> {
>   @json_output_data_start
>   @json_output_string_value("xstring", "testing, 1, 2, 3")
>   @json_output_numeric_value("read_count", read_count)
>   read_count++
>
>   foreach (dev in net_devices) {
>     if (@count(skb_queue_t[dev])) {
>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count",
> 				       @sum(skb_queue_t[dev]))
>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_latency",
> 				       @count(skb_queue_t[dev]))
> [...]
>   @json_output_data_end
> }
>
> // Set up the metrics
> probe begin
> {
> [...]
>   json_add_string_metric("xstring", "Test string")
>   json_add_numeric_metric("read_count", "Times values read")
> [...]
> }

Have you considered merging together these two bits of code, so that a
single stap probe alias that generates json data is also used to
populate metadata globals, so a subsequent (!) schema json query would
be possible?  Something like

probe json_data {
   @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
   @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
                         "sum of latency for xmit device", $UNITS/SCALE)
}

so that the metadata is attached at the end of the data-supplying calls?

> #!/usr/bin/python
> import json
> import jsonschema
> import collections
> [...]

This PMDA code looks quite manageable; in particular the cumulative
nature of imdom changes looks right on.  Please don't shoot me as I'll
suggest a somewhat different schema/data encoding below; I'm pretty
sure they're not too hard to express in the python code.  The main
reason for proposing the changes is so that this pmda has a fighting
chance at consuming json data from sources other than just the above
systemtap tapset.

> {
>   "generation": 1,
>   "data": {

(IMHO we shouldn't mandate such wrappers.)

>     "xstring": "testing, 1, 2, 3",
>     "read_count": 9,
>     "net_xmit_data": [
>       {
>         "__id": "eth0",
>         "xmit_count": 7699136,
>         "xmit_latency": 1109
>       },

(We already mentioned in passing how the "__id" string might be
desirable to be schema-configured.)

Anyway, onto the schema.  

I see how you chose the json-schema.org to piggyback-ride on.  One
thing we should keep in mind though that json-schema does a slightly
different thing than what we need.  It's more like an XML DTD, and
just describes what's a "grammatically correct" document.  We do not
really need this exact kind of checking, but it's not a big hindrance
either - it's not grossly wordy.  (The "additionalProperties=false"
might be an example of unhelpful wordiness though.)

What we really need is the interpretation, for purposes of extracting
data and relaying to PCP.  And for that, we can be a little more
aggressive in the sense of adding our own schema elements, rather than
riding on top of json-schema.org patterns.  For example:

> {
>   "type": "object",
>   "title": "root",
> [...]
>   "properties": {
> [...]
>     "data": {
> [...]
>       "properties": {
>         "xstring": {
>           "type": "string",
>           "description": "Test string",
>           "additionalProperties": false
>         },
> [...]

The prototype PMDA turns this into metric "json.xstring" of pcp
PM_TYPE_STRING.  The heuristic's probably fine, but if we want
more generality, we could as well do something like this to
describe a scalar:

{ "foo": { "xstring":
     { "pcp-name":"foo.bar.xstring",
       "pcp-type":"string",          // esp. if pmda offers to cast
       "pcp-units":"MBytes/sec",     // need an inverter for pmUnitsStr(3)
       "pcp-semantics":"instant",    // need an inverter for PM_SEM_*
       "pcp-shorthelp":"short help", 
       "pcp-longhelp":"long help" 
     }
} }

(Adding the json-schema fields is optional & orthogonal.)

One benefit of a formal "pcp-name" field here is that the mapping from
the JSON nesting structure need not match the pcp namespace exactly.
It would let the json object name components be free of constraints
like not containing dots (since we would not propagate them to pcp).

>         "net_xmit_data": {
>           "type": "array",
>           "description": "Network transmit data indexed by ethernet device",
>           "additionalProperties": false,
>           "items": {
>             "type": "object",
>             "additionalProperties": false,
>             "properties": {
>               "__id": {
>                 "type": "string",
>                 "additionalProperties": false
>               },
>               "xmit_count": {
>                 "type": "integer",
>                 "description": "number of packets for xmit device",
>                 "additionalProperties": false
>               },
> [...]

Here's an alternative formulation kind of along the previous one:

{ "bar": { "networks":
     { "pcp-imdom-discriminator":"__id",  // parametrizing this
       "type":"array",   // json-schema style identification of array-ness
       "items": {
          "xmit": {
            "pcp-name":"bar.xmitfoo",
            "pcp-type":"float",           // (stap can print fp with some effort)
            "pcp-units":"Bytes/hour",
            "pcp-semantics":"instant",
            "pcp-shorthelp":"short help", 
            "pcp-longhelp":"long help" 
            }
      } }
} }

Again, the json-schema parts are mostly orthogonal (just kiting the
array-ness description).

So what would something like this give us?  At the pure stap-pmda
level, not that much extra over what the 0.1 prototype has.  But
beyond stap, we may well be able to write some schema for more general
json files, trivially e.g. ones that lack the "data" as top-level
wrapper.

Before I go and write a bigger example schema of some other JSON data
we have lying around, do you see what I'm getting at?  Do you agree
that this style is also implementable in the python pmda?

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: systemtap/pcp integration pmda 0.1
  2014-09-22 22:05 ` Frank Ch. Eigler
@ 2014-09-23 14:39   ` David Smith
  2014-09-23 18:44     ` Frank Ch. Eigler
  0 siblings, 1 reply; 4+ messages in thread
From: David Smith @ 2014-09-23 14:39 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Systemtap List, pcp

On 09/22/2014 05:04 PM, Frank Ch. Eigler wrote:
> 
> dsmith wrote:
> 
>> Here's version 0.1 (up from 0.01!) of my systemtap/pcp integration work
> 
> Thanks a lot!
> 
>> that uses systemtap (https://sourceware.org/systemtap/) to export JSON
>> data and a pcp (http://www.performancecopilot.org/) python pmda that
>> reads and processes the JSON data. [...]
> 
> (Sorry I missed this when it went by -- please call the next
> version 3.14 or something else very different from 0.1! :-)

The 1st one was called '0.01'. See, I'm making progress...

>> # pminfo -df stap_json
>>
>> stap_json.json.dummy2
>>     Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
>>     Semantics: instant  Units: none
>>     value "dummy2"
>>
>> stap_json.json.dummy_array.dummy2
>>     Data Type: string  InDom: 130.1 0x20800001
>>     Semantics: instant  Units: none
>>     inst [0 or "1"] value "def"
>>     inst [1 or "0"] value "abc"
>>     inst [2 or "2"] value "ghi"
>> [...]
> 
> Looking good!
> 
> 
>> // ===========================
>> // ==== Tapset file start ====
>> // ===========================
>>
> 
> Looks long but generally good.  (But see below re. suggestions about
> the schema/data representations.)
> 
> 
>> global net_devices
>> global read_count
>>
>> probe json_data
>> {
>>   @json_output_data_start
>>   @json_output_string_value("xstring", "testing, 1, 2, 3")
>>   @json_output_numeric_value("read_count", read_count)
>>   read_count++
>>
>>   foreach (dev in net_devices) {
>>     if (@count(skb_queue_t[dev])) {
>>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count",
>> 				       @sum(skb_queue_t[dev]))
>>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_latency",
>> 				       @count(skb_queue_t[dev]))
>> [...]
>>   @json_output_data_end
>> }
>>
>> // Set up the metrics
>> probe begin
>> {
>> [...]
>>   json_add_string_metric("xstring", "Test string")
>>   json_add_numeric_metric("read_count", "Times values read")
>> [...]
>> }
> 
> 
> Have you considered merging together these two bits of code, so that a
> single stap probe alias that generates json data is also used to
> populate metadata globals, so a subsequent (!) schema json query would
> be possible?  Something like
> 
> probe json_data {
>    @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
>    @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
>                          "sum of latency for xmit device", $UNITS/SCALE)
> }
> 
> so that the metadata is attached at the end of the data-supplying calls?

Hmm, I hadn't considered that. It might be possible to do, but seems
quite tricky in getting the macros right to support both schema output
and data output.

>> #!/usr/bin/python
>> import json
>> import jsonschema
>> import collections
>> [...]
> 
> This PMDA code looks quite manageable; in particular the cumulative
> nature of imdom changes looks right on.  Please don't shoot me as I'll
> suggest a somewhat different schema/data encoding below; I'm pretty
> sure they're not too hard to express in the python code.  The main
> reason for proposing the changes is so that this pmda has a fighting
> chance at consuming json data from sources other than just the above
> systemtap tapset.
> 
> 
> 
>> {
>>   "generation": 1,
>>   "data": {
> 
> (IMHO we shouldn't mandate such wrappers.)

Here's the deal here. I stole the "generation" idea from the mmv code.
If I want to support being able to add/remove fields on the fly, I have
to let the pmda know something has changed. Hence the "generation"
field. In theory a change in the generation value would trigger the pmda
to re-read the JSON schema. (I say "in theory" because I haven't
implemented that yet in the systemtap or pcp side.)

There may be other ways to notice added/removed fields.

The only reason for the "data" wrapper is that I didn't want to disallow
a user's "generation" data field.

>>     "xstring": "testing, 1, 2, 3",
>>     "read_count": 9,
>>     "net_xmit_data": [
>>       {
>>         "__id": "eth0",
>>         "xmit_count": 7699136,
>>         "xmit_latency": 1109
>>       },
> 
> (We already mentioned in passing how the "__id" string might be
> desirable to be schema-configured.)

Yep, it could be possible to have that configurable.

> Anyway, onto the schema.  
> 
> I see how you chose the json-schema.org to piggyback-ride on.  One
> thing we should keep in mind though that json-schema does a slightly
> different thing than what we need.  It's more like an XML DTD, and
> just describes what's a "grammatically correct" document.  We do not
> really need this exact kind of checking, but it's not a big hindrance
> either - it's not grossly wordy.  (The "additionalProperties=false"
> might be an example of unhelpful wordiness though.)

The "additionalProperties=false" is optional, and just a bit more
correct. We can remove it.

> What we really need is the interpretation, for purposes of extracting
> data and relaying to PCP.  And for that, we can be a little more
> aggressive in the sense of adding our own schema elements, rather than
> riding on top of json-schema.org patterns.  For example:
> 
> 
>> {
>>   "type": "object",
>>   "title": "root",
>> [...]
>>   "properties": {
>> [...]
>>     "data": {
>> [...]
>>       "properties": {
>>         "xstring": {
>>           "type": "string",
>>           "description": "Test string",
>>           "additionalProperties": false
>>         },
>> [...]
> 
> The prototype PMDA turns this into metric "json.xstring" of pcp
> PM_TYPE_STRING.  The heuristic's probably fine, but if we want
> more generality, we could as well do something like this to
> describe a scalar:
> 
> { "foo": { "xstring":
>      { "pcp-name":"foo.bar.xstring",
>        "pcp-type":"string",          // esp. if pmda offers to cast
>        "pcp-units":"MBytes/sec",     // need an inverter for pmUnitsStr(3)
>        "pcp-semantics":"instant",    // need an inverter for PM_SEM_*
>        "pcp-shorthelp":"short help", 
>        "pcp-longhelp":"long help" 
>      }
> } }
> 
> (Adding the json-schema fields is optional & orthogonal.)

I see what you are doing here, but I'm quite unsure. If you goal is to
handle JSON from a variety of sources, to my mind this is a step
backwards. Your more generic source isn't likely to output a schema in
that format.

> One benefit of a formal "pcp-name" field here is that the mapping from
> the JSON nesting structure need not match the pcp namespace exactly.
> It would let the json object name components be free of constraints
> like not containing dots (since we would not propagate them to pcp).

Validating names (no dots, spaces, etc. and not too long) is on my todo
list.

Currently, fields end like the the following:

stap_json.{STAP_MODULE_NAME}.{FIEILD_NAME}

Originally I had designs of allowing the user to override
{STAP_MODULE_NAME}. But then we have issues with that field being
unique. For instance if the same systemtap script was run twice, both
would try to override the field to the same value. Since we're assured
that {STAP_MODULE_NAME} is unique, I just decided to go with it.

I'm not really fond of the 'pcp-name' field idea. It means more
validation (on both sides?) in not allowing things like "foo.bar" being
a value and then "foo.bar.baz" being a value.

>>         "net_xmit_data": {
>>           "type": "array",
>>           "description": "Network transmit data indexed by ethernet device",
>>           "additionalProperties": false,
>>           "items": {
>>             "type": "object",
>>             "additionalProperties": false,
>>             "properties": {
>>               "__id": {
>>                 "type": "string",
>>                 "additionalProperties": false
>>               },
>>               "xmit_count": {
>>                 "type": "integer",
>>                 "description": "number of packets for xmit device",
>>                 "additionalProperties": false
>>               },
>> [...]
> 
> Here's an alternative formulation kind of along the previous one:
> 
> 
> { "bar": { "networks":
>      { "pcp-imdom-discriminator":"__id",  // parametrizing this
>        "type":"array",   // json-schema style identification of array-ness
>        "items": {
>           "xmit": {
>             "pcp-name":"bar.xmitfoo",
>             "pcp-type":"float",           // (stap can print fp with some effort)
>             "pcp-units":"Bytes/hour",
>             "pcp-semantics":"instant",
>             "pcp-shorthelp":"short help", 
>             "pcp-longhelp":"long help" 
>             }
>       } }
> } }
> 
> 
> Again, the json-schema parts are mostly orthogonal (just kiting the
> array-ness description).
> 
> So what would something like this give us?  At the pure stap-pmda
> level, not that much extra over what the 0.1 prototype has.  But
> beyond stap, we may well be able to write some schema for more general
> json files, trivially e.g. ones that lack the "data" as top-level
> wrapper.
>
> Before I go and write a bigger example schema of some other JSON data
> we have lying around, do you see what I'm getting at?  Do you agree
> that this style is also implementable in the python pmda?

This is probably implementable, although I do lose the easy data/schema
validation provided by the stock python JSON stuff.

I guess I'm coming at this from a different angle.

- If we want this pmda to (one day) support more generic JSON sources,
we'll have to expect generic JSON schemas.

- If we'd like the systemtap side of things to be able to support other
data collectors (nagios, zabbix, etc.), it should export a fairly
generic JSON schema.

To my mind, the changes you've got here take us farther from both goals.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: systemtap/pcp integration pmda 0.1
  2014-09-23 14:39   ` David Smith
@ 2014-09-23 18:44     ` Frank Ch. Eigler
  0 siblings, 0 replies; 4+ messages in thread
From: Frank Ch. Eigler @ 2014-09-23 18:44 UTC (permalink / raw)
  To: David Smith; +Cc: Systemtap List, pcp

Hi -

dsmith wrote:
> [...]
> > probe json_data {
> >    @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
> >    @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
> >                          "sum of latency for xmit device", $UNITS/SCALE)
> > }
> > 
> > so that the metadata is attached at the end of the data-supplying calls?
> 
> Hmm, I hadn't considered that. It might be possible to do, but seems
> quite tricky in getting the macros right to support both schema output
> and data output.

The macros wouldn't need to -output- the schema, only update the stap
tapset globals from which the schema can be read later / separately.
(OTOH, it could indeed generate the schema document at the same time,
and store it in some other stap global variable, so that the procfs
schema-reader could be just a string copy-out.)

> >> {
> >>   "generation": 1,
> >>   "data": {
> > 
> > (IMHO we shouldn't mandate such wrappers.)
> 
> Here's the deal here. I stole the "generation" idea from the mmv code.
> If I want to support being able to add/remove fields on the fly, I have
> to let the pmda know something has changed. [...]

Understood, good idea.  Such a field could be optional & identified by
another pcp metadata field rather than hardcoded.  Or the schema could
be reread regularly.

> I see what you are doing here, but I'm quite unsure. If you goal is to
> handle JSON from a variety of sources, to my mind this is a step
> backwards. Your more generic source isn't likely to output a schema in
> that format.

I wasn't explaining this part well, sorry.  The idea is:

- the stap source of json data would programmatically emit both data &
  pcp-schema

- non-stap sources of json data would emit data (in their own
  preexisting custom format, not aware of pcp!), and a pcp-schema for
  it would be hand-written by us

- both of the above would be usable by the *same* pmda code, making it
  a schema-driven processor of general json data

(Perhaps conflating the word "schema" and "metadata" is not helping.) 

> > One benefit of a formal "pcp-name" field here is that the mapping from
> > the JSON nesting structure need not match the pcp namespace exactly.
> > It would let the json object name components be free of constraints
> > like not containing dots (since we would not propagate them to pcp).
> 
> Validating names (no dots, spaces, etc. and not too long) is on my todo
> list.

Right; my point is that instead of imposing such a constraint on the
JSON data structure, this could be a constraint on the pcp-specific
metadata tags in the metadata file.

> Originally I had designs of allowing the user to override
> {STAP_MODULE_NAME}. But then we have issues with that field being
> unique. For instance if the same systemtap script was run twice, both
> would try to override the field to the same value. Since we're assured
> that {STAP_MODULE_NAME} is unique, I just decided to go with it.

Yeah, that makes it simple, though stap_XXXXX names are hard to
predict/reuse, and stap -m FOO is also inconvenient.  Perhaps the
schema could include a suggested root name, which the pmda could
resolve/reject ties amongst duplicates.

> I'm not really fond of the 'pcp-name' field idea. It means more
> validation (on both sides?) in not allowing things like "foo.bar" being
> a value and then "foo.bar.baz" being a value.

The pmda would be in a comfortable position to check such PCP PMNS
constraints, since it'd know every pcp-name used in a schema.

> This is probably implementable, although I do lose the easy
> data/schema validation provided by the stock python JSON stuff.

(Well, not necessarily, as the pcp-* attributes could be just added to
a json-schema.org schema, so the same overall file can serve both
purposes.  Again recall though that we are not really obligated to
validate the random JSON data against any consistency with a schema;
we really only want to pull out designated parts of it for relaying to
PCP.)

> I guess I'm coming at this from a different angle.
> 
> - If we want this pmda to (one day) support more generic JSON sources,
> we'll have to expect generic JSON schemas.

> - If we'd like the systemtap side of things to be able to support other
> data collectors (nagios, zabbix, etc.), it should export a fairly
> generic JSON schema.

> To my mind, the changes you've got here take us farther from both goals.

I hope the above clarifies why this is not actually the case.  We get
to design a *specific* schema/metadata grammar for PCP, and our
tooling would construct these files (e.g., the stap tapset), or our
tools would *include* these files (e.g., imagine writing out by hand a
pcp-name etc.  metadata file for the CEPH JSON data, and including
that with the pmda).

- FChE

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-23 18:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-12 18:37 systemtap/pcp integration pmda 0.1 David Smith
2014-09-22 22:05 ` Frank Ch. Eigler
2014-09-23 14:39   ` David Smith
2014-09-23 18:44     ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).