* suitability of PCP for event tracing @ 2010-08-27 15:39 Frank Ch. Eigler 2010-08-29 15:55 ` [pcp] " Ken McDonell [not found] ` <4C7A7DFE.2040606@internode.on.net> 0 siblings, 2 replies; 14+ messages in thread From: Frank Ch. Eigler @ 2010-08-27 15:39 UTC (permalink / raw) To: pcp; +Cc: systemtap Hi - We're investigating to what extent the PCP suite may be suitable for more general low-level event tracing. Just from docs / source gazing (so please excuse my terminology errors), a few challenges would seem to be: * poll-based data gathering It seems as though PMDAs are used exclusively in 'polling' mode, meaning that underlying system statistics are periodically queried and summary results stored. In our context, it would be useful if PMDAs could push event data into the stream as they occur - perhaps hundreds of times a second. * relatively static pmns It would be desirable if PMNS metrics were parametrizable with strings/numbers, so that a PMDA engine could use it to synthesize metrics on demand from a large space. (Example: have a "kernel-probe" PMNS namespace, parametrized by function name, which returns statistics of that function's execution. There are too many kernel functions, and they vary from host to host enough, so that enumerating them as a static PMNS table would be impractical.) * scalar payloads It seems as though each metric value provided by PMDAs is necessarily a scalar value, as opposed to some structured type. For event tracing, it would be useful to have tuples. Front-ends could choose the interesting fields to render. (Example: tracing NFS calls, complete with decoded payloads.) * filtering It would be desirable for the apps fetching metric values to communicate a filtering predicate associated with them, perhaps as per pmie rules. This is to allow the data server daemon to reduce the amount of data sent to the gui frontends. Perhaps also it could use them to inform PMDAs as a form of subscription, and in turn they could reduce the amount of data flow. * no web-based frontends In our usage, it would be desirable to have some mini pcp-gui that is based on web technologies rather than QT. To what extent could/should PCP be used/extended to cover this space? - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-08-27 15:39 suitability of PCP for event tracing Frank Ch. Eigler @ 2010-08-29 15:55 ` Ken McDonell 2010-09-01 15:05 ` David Smith [not found] ` <4C7A7DFE.2040606@internode.on.net> 1 sibling, 1 reply; 14+ messages in thread From: Ken McDonell @ 2010-08-29 15:55 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: systemtap [resending as text so it makes it to the systemtap list] On 28/08/2010 1:39 AM, Frank Ch. Eigler wrote: > Hi - > > We're investigating to what extent the PCP suite may be suitable for > more general low-level event tracing. Just from docs / source gazing > (so please excuse my terminology errors), a few challenges would seem > to be: G'day Frank and others. Apologies for the length of this reply, but there are a number of non-trivial issues at play here. Nathan has already answered some of your questions. I'd like to start by providing some historical and design center context. From the outset PCP was *not* designed for event-tracing, but PCP *was* designed for a specific class of performance monitoring and management scenarios. The table below outlines some of the differences ... these help to explain why PCP is /a priori/ not necessarily suitable for event tracing. This does not mean PCP could not evolve to support event-tracing in the ways Nathan has suggested, we just need to understand that the needs are different and make sure we do not end up morphing PCP into something that no longer works for the original design center and may not work all that well for event tracing. Locality of data processing PCP Design Center Monitored system is typically not the same system that the collection and/or analysis is performed on. Event Tracing Data collection happens on the system being monitored, analysis may happen later on another system. Real time analysis PCP Design Center Central to the design requirements. Event Tracing Often not required, other than edge-triggers to start and stop collection. Retrospective analysis PCP Design Center Central to the design requirements. Event Tracing Central to the design requirements. Time scales PCP Design Center We are typically concerned with large and complex systems where average levels of activity over periods of the order of tens of seconds are representative. Event Tracing Short-term and transients are often important, and inter-arrival time for events may be on the order of milliseconds. Data rates PCP Design Center Moderate. Monitoring is often long-term, requiring broad and shallow data collection, with a small number of narrow and deep collections aligned to known or suspected problem areas. Event Tracing Very high. Monitoring is most often narrow, deep and short-lived. Data spread PCP Design Center Very broad ... interesting data may come from a number of places, e.g. hardware instrumentation, operating system stats, service layers and libraries, applications and distributed applications. Event Tracing Very narrow ... one source and one host. Data semantics PCP Design Center A very broad range, but the most common are activity levels and event *counters* (with little or no event parameter information) Event Tracing Very specific, being the record of an event and its parameters with a high resolution time stamp. Data source extensibility PCP Design Center Critical. Event Tracing Rare. So with this backgrtound, let's look at Frank's specific questions. > * poll-based data gathering > > It seems as though PMDAs are used exclusively in 'polling' mode, > meaning that underlying system statistics are periodically queried > and summary results stored. In our context, it would be useful if > PMDAs could push event data into the stream as they occur - perhaps > hundreds of times a second. Yep, this would be a big change. There is not really a data stream in PCP ... there is a source of performance metrics (a host or an archive) and clients connect to that source and pull data at a sample interval defined by the client. At the host source, the co-ordinating daemon (pmcd) maintains no cache nor stream of recent data ... a client asks for a specific subset of the available information, this is instantiated and returned to the client. There is no requirement for the subsets of the requested information to be the same for consecutive requests from a single client, and pmcd is receiving requests from a number of clients that are handled completely independently. As Nathan has suggested, if event traces are intended for retrospective analysis (as opposed to event counters being suited for either real time or retrospective analysis), then there is an alternative approach, namely to create a PCP archive directly from a source of data without involving pmcd or a pmda or pmlogger. We've recently reworked the "pmimport" services to expose better APIs to support just this style of use ... see LOGIMPORT(3) and sar2pcp(1) for an example. I think this approach is possibly a better semantic match between PCP and a stream of event records. > * relatively static pmns > > It would be desirable if PMNS metrics were parametrizable with > strings/numbers, so that a PMDA engine could use it to synthesize > metrics on demand from a large space. (Example: have a > "kernel-probe" PMNS namespace, parametrized by function name, which > returns statistics of that function's execution. There are too many > kernel functions, and they vary from host to host enough, so that > enumerating them as a static PMNS table would be impractical.) This is not so much of a problem. We've relaxed the PMNS services to allow PMDAs to dynamically define new metrics on the fly. And as Nathan has pointed out, the instance domain provides a dynamic dimension for the available metric values that may also be useful, e.g. this is how all of procfs is instantiated. > * scalar payloads > > It seems as though each metric value provided by PMDAs is > necessarily a scalar value, as opposed to some structured type. For > event tracing, it would be useful to have tuples. Front-ends could > choose the interesting fields to render. (Example: tracing NFS > calls, complete with decoded payloads.) > We've tried really hard to make the PCP metadata rich enough (in the data model and the API services) to enable clients to be data-driven, based on what performance data happens to be available today from a host or archive. This is why the data aggregate (or blob) data type that Nathan has mentioned is rarely used (although it is fully supported). If there was a tight coupling between the source of the event data and the client that interprets the event data, then the PCP data aggregate could be used to provide a transport and storage encapsulation that is consistent with the PCP APIs and protocols. Of course, such a client would be exposed to all of the word-size, endian and version issues that plague other binary formats for performance data, e.g. the sar variants based on AT&T UNIX. > * filtering > > It would be desirable for the apps fetching metric values to > communicate a filtering predicate associated with them, perhaps as > per pmie rules. This is to allow the data server daemon to reduce > the amount of data sent to the gui frontends. Perhaps also it could > use them to inform PMDAs as a form of subscription, and in turn they > could reduce the amount of data flow. PMDAs are free to do as much or as little work as they choose. Some are totally demand-driven, instantiating only the information they are asked for when they are asked for it. Others use cacheing strategies to refresh some or all of the information at each request. Others maintain timestamped caches and only refresh when the information is deemed "stale". Another class run a refresh thread that is contunally updating a data cache, and requests are serviced from the cache. The PMDA behaviour can be modal ... based on client requests, or more interestingly as Nathan has suggested using the pmStore(3) API to allow one or more clients to enable/disable collection (think about expensive, detailed information that you don't want to collect unless some client *really* wants it). The values passed into the PMDA via pmStore(3) are associated with PCP metrics, so they have the full richness of the PCP data model to encode switches, text strings, blobs, etc. > * no web-based frontends > > In our usage, it would be desirable to have some mini pcp-gui that > is based on web technologies rather than QT. There are several examples of web interfaces driven by PCP data ... but each of these has been developed as a proprietary and specific application and hence is not included in the PCP open source distribution. The PCP APIs provide all the services needed to build something like this. > > To what extent could/should PCP be used/extended to cover this space? I think this suggestion is worth further discussion, but we probably need some more concrete examples of the sorts of event trace data that is being considered, and the most likely use cases and patterns for that data. Cheers, Ken. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-08-29 15:55 ` [pcp] " Ken McDonell @ 2010-09-01 15:05 ` David Smith 2010-09-06 16:39 ` Ken McDonell 0 siblings, 1 reply; 14+ messages in thread From: David Smith @ 2010-09-01 15:05 UTC (permalink / raw) To: Ken McDonell; +Cc: Frank Ch. Eigler, systemtap On 08/29/2010 10:54 AM, Ken McDonell wrote: ... stuff deleted ... > As Nathan has suggested, if event traces are intended for retrospective > analysis (as opposed to event counters being suited for either real time > or retrospective analysis), then there is an alternative approach, > namely to create a PCP archive directly from a source of data without > involving pmcd or a pmda or pmlogger. We've recently reworked the > "pmimport" services to expose better APIs to support just this style of > use ... see LOGIMPORT(3) and sar2pcp(1) for an example. I think this > approach is possibly a better semantic match between PCP and a stream of > event records. Hmm. If I'm understanding all the acronyms correctly, I'm not seeing the benefit of using LOGIMPORT to create a PCP archive vs. involving pcmd/pmda/pmlogger. Could you expand here? Thanks. -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-01 15:05 ` David Smith @ 2010-09-06 16:39 ` Ken McDonell 0 siblings, 0 replies; 14+ messages in thread From: Ken McDonell @ 2010-09-06 16:39 UTC (permalink / raw) To: David Smith; +Cc: Frank Ch. Eigler, systemtap On 2/09/2010 1:05 AM, David Smith wrote: > On 08/29/2010 10:54 AM, Ken McDonell wrote: > > ... stuff deleted ... > >> As Nathan has suggested, if event traces are intended for retrospective >> analysis (as opposed to event counters being suited for either real time >> or retrospective analysis), then there is an alternative approach, >> namely to create a PCP archive directly from a source of data without >> involving pmcd or a pmda or pmlogger. We've recently reworked the >> "pmimport" services to expose better APIs to support just this style of >> use ... see LOGIMPORT(3) and sar2pcp(1) for an example. I think this >> approach is possibly a better semantic match between PCP and a stream of >> event records. > > Hmm. If I'm understanding all the acronyms correctly, I'm not seeing > the benefit of using LOGIMPORT to create a PCP archive vs. involving > pcmd/pmda/pmlogger. Could you expand here? David, The "benefit" is that importing data to create a PCP archive is a data translation process that is not dependent on polled sampling of data ... you can consume a stream of timestamped data and create a corresponding PCP archive as an off-line or batch process. Import tools are also comparatively simple to write. The "disadvantage" is that you're no closer to real-time monitoring with this approach, so it is usually used in cases where there is an existing body of historical data and one is interested in using pmie or pmchart or pmlogsummary for some retrospective analysis ... non-PCP tools like sar and monitoring subsystems that support "Export to Excel" are the most common examples. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4C7A7DFE.2040606@internode.on.net>]
* Re: [pcp] suitability of PCP for event tracing [not found] ` <4C7A7DFE.2040606@internode.on.net> @ 2010-08-31 3:29 ` Greg Banks 2010-08-31 19:49 ` Frank Ch. Eigler 1 sibling, 0 replies; 14+ messages in thread From: Greg Banks @ 2010-08-31 3:29 UTC (permalink / raw) To: Ken McDonell; +Cc: Frank Ch. Eigler, systemtap, pcp Ken McDonell wrote: > On 28/08/2010 1:39 AM, Frank Ch. Eigler wrote: > > The table below outlines some of the differences ... these help to > explain why PCP is /a priori/ not necessarily suitable for event > tracing. > > > > > > > > > > > > > > > > > > > > > > > > > > > I think another problem is the dynamic range of time scales. Event tracing tracing tends to require analysis of behaviour that manifests at wildly varying time scales in the same trace, from the tens of seconds down to the microseconds. PCP's front ends are not very good at doing this kind of thing, and don't really handle zooming or LoD or bookmarking well. > > >> * no web-based frontends >> >> In our usage, it would be desirable to have some mini pcp-gui that >> is based on web technologies rather than QT. >> > > There are several examples of web interfaces driven by PCP data ... > but each of these has been developed as a proprietary and specific > application and hence is not included in the PCP open source > distribution. The PCP APIs provide all the services needed to build > something like this. > Myself and at least one other person on the PCP list have been involved with designing three generations of one such proprietary web front end, and we found it quite a difficult problem to solve. The main issue was that the PCP architecture is basically a stateless client-driven pull, so that any operation which needs to maintain state across multiple samples (like time averages, or rate conversion of counters) needs to be done all the way out in the client. Our browser requirements prevented us from using Javascript, so we had no practical way to do that, and had to insert a caching/rate conversion/averaging daemon in between. That daemon proved...troublesome. These days a JS + AJAX + SVG solution would probably do the trick nicely, and would be interesting to write. Also, Frank: you mentioned NFS in passing; I'm curious as to what exactly you're up to? -- Greg. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing [not found] ` <4C7A7DFE.2040606@internode.on.net> 2010-08-31 3:29 ` Greg Banks @ 2010-08-31 19:49 ` Frank Ch. Eigler 2010-09-01 6:25 ` Mark Goodwin 2010-09-12 16:43 ` Ken McDonell 1 sibling, 2 replies; 14+ messages in thread From: Frank Ch. Eigler @ 2010-08-31 19:49 UTC (permalink / raw) To: pcp, systemtap Hi - Thanks, Nathan, Ken, Greg, Mark, for clarifying the status quo and some of the history. We understand that the two problem domains are traditionally handled with the event-tracing -vs- stats-monitoring distinction. We're trying to see where best to focus efforts to make some small steps to bridge the two, where plenty of compromises are possible. We'd prefer to help build on an existing project with a nice community than to do new stuff. For the poll-based data gathering issue, a couple of approaches came up: (1) bypassing pmcd and generating an pmarchive file directly from trace data This appears to imply continuing the archive-vs-live dichotomy that makes it difficult for clients to process both recent and current data seamlessly together. Since using such files would probably also need a custom client, then we'd not be using much of the pcp infrastructure, only as a passive data encoding layer. This may not be worthwhile. (2) protocol extensions for live-push on pmda and pmcd-client interfaces This clearly larger effort is only worth undertaking with the community's sympathy and assistance. It might have some interesting integration possibilities with the other tools, espectially pmie (the inference engine). For the static-pmns issue, the possibility of dynamic instance domains, metric subspaces is probably sufficient, if the event parameters are limited to only 1-2 degrees of freedom. (In contrast, imagine browsing a trace of NFS or kernel VFS operations; these have ~5 parameters.) For the scalar-payloads issue, the BLOB/STRING metric types are indeed available but are opaque to other tools, so don't compose well. Would you consider one additional data type, something like a JSON[1] string? It would be self-describing, with pmie and general processing opportunities, though those numbers would lack the PMDA_PMUNITS dimensioning. For the filtering issue, pmStore() is an interesting possibility, allowing the PMDAs to bear the brunt. OTOH, if pmcd evolved into a data-push-capable widget, it could serve as a filtering proxy, requiring separate API or interpretation of the pmStore data. For the web-based frontend issue, yeah, javascript+svg+etc. sounds most promising, especially if it can be made to speak the native wire protocol to pmdc. This would seem to argue for a stateful archive-serving pmdc, or perhaps a archive-serving proxy, as in Greg's old project. Is this sounding reasonable? - FChE [1] http://en.wikipedia.org/wiki/JSON ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-08-31 19:49 ` Frank Ch. Eigler @ 2010-09-01 6:25 ` Mark Goodwin 2010-09-02 2:05 ` Greg Banks 2010-09-12 16:43 ` Ken McDonell 1 sibling, 1 reply; 14+ messages in thread From: Mark Goodwin @ 2010-09-01 6:25 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: pcp, systemtap On 09/01/2010 05:49 AM, Frank Ch. Eigler wrote: > Hi - > > Thanks, Nathan, Ken, Greg, Mark, for clarifying the status quo and > some of the history. > > We understand that the two problem domains are traditionally handled > with the event-tracing -vs- stats-monitoring distinction. We're trying > to see where best to focus efforts to make some small steps to bridge > the two, where plenty of compromises are possible. We'd prefer to > help build on an existing project with a nice community than to do new > stuff. yes certainly :) > For the poll-based data gathering issue, a couple of approaches came up: > > (1) bypassing pmcd and generating an pmarchive file directly from > trace data This appears to imply continuing the archive-vs-live > dichotomy that makes it difficult for clients to process both > recent and current data seamlessly together. one of the issues with the live vrs archive dichotomy is that live data is always available (since you're requesting it explicitly from a PMDA that is otherwise passive), whereas the archive data is not available unless configured to be collected before-hand (see pmlogger). There is too much data to collect everything all the time - it's too impracticable and intrusive, so some form of filtering and/or aggregation needs to be done (see pmlogsummary, and Greg's old project too). > Since using such > files would probably also need a custom client, then we'd not be > using much of the pcp infrastructure, only as a passive data > encoding layer. This may not be worthwhile. > > (2) protocol extensions for live-push on pmda and pmcd-client interfaces > This clearly larger effort is only worth undertaking with the > community's sympathy and assistance. It might have some > interesting integration possibilities with the other tools, > espectially pmie (the inference engine). yep - I suspect Ken and maybe Nathan would have further comments on this > > For the static-pmns issue, the possibility of dynamic instance > domains, metric subspaces is probably sufficient, if the event > parameters are limited to only 1-2 degrees of freedom. (In contrast, > imagine browsing a trace of NFS or kernel VFS operations; these have > ~5 parameters.) PCP instance domains are traditionally single dimensional, though there are a few exceptions such as kernel.percpu.interrupts. It's easy enough to split multi-dimensional data structures out into multiple metrics with a common instance domain. > For the scalar-payloads issue, the BLOB/STRING metric types are indeed > available but are opaque to other tools, so don't compose well. Would > you consider one additional data type, something like a JSON[1] > string? It would be self-describing, with pmie and general processing > opportunities, though those numbers would lack the PMDA_PMUNITS > dimensioning. this could work using string or binary blob data types in the existing protocols - though there is a size limit. And one of the blessed features of PCP is the client monitoring tools can more or less monitor any metrics - so any solution here would also need specially crafted client tools. Extensions to the perl binding would probably work best, e.g. interface with perl-JASON-* > For the filtering issue, pmStore() is an interesting possibility, > allowing the PMDAs to bear the brunt. OTOH, if pmcd evolved into a > data-push-capable widget, it could serve as a filtering proxy, > requiring separate API or interpretation of the pmStore data. well pmcd is already data-push capable using the pmstore interface, allowing clients to store values for certain metrics in some of the PMDAs. Filtering and parsing is done by the PMDA itself and pmcd just acts as a proxy passthru (kind of a back-channel to the pull interface). pmstore hasn't really been used in anger like this though - more just for setting config & control options and the like. The same (or similar) protocol has also been used for a data source to open a socket directly to a PMDA and tie into the PMDA's select loop, rather than going via pmcd. > > For the web-based frontend issue, yeah, javascript+svg+etc. sounds > most promising, especially if it can be made to speak the native wire > protocol to pmdc. This would seem to argue for a stateful > archive-serving pmdc, or perhaps a archive-serving proxy, as in Greg's > old project. Time averaging, aggregation and filtering were all ambitious aims of the project Greg's talking about - I wonder if that code could ever be resurrected and open sourced? One abomination here was that a PMDA could also be a client - and potentially query itself for metrics(!) > Is this sounding reasonable? > it's going to take a lot more discussion, but enthusiasm seems to be on our side :) Cheers -- Mark Goodwin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-01 6:25 ` Mark Goodwin @ 2010-09-02 2:05 ` Greg Banks 2010-09-02 19:40 ` Frank Ch. Eigler 0 siblings, 1 reply; 14+ messages in thread From: Greg Banks @ 2010-09-02 2:05 UTC (permalink / raw) To: Mark Goodwin; +Cc: Frank Ch. Eigler, systemtap, pcp Mark Goodwin wrote: > On 09/01/2010 05:49 AM, Frank Ch. Eigler wrote: > > >> For the static-pmns issue, the possibility of dynamic instance >> domains, metric subspaces is probably sufficient, if the event >> parameters are limited to only 1-2 degrees of freedom. (In contrast, >> imagine browsing a trace of NFS or kernel VFS operations; these have >> ~5 parameters.) >> > > PCP instance domains are traditionally single dimensional, though there > are a few exceptions such as kernel.percpu.interrupts. It's easy enough > to split multi-dimensional data structures out into multiple metrics with > a common instance domain. > Two comments. Firstly, do you need to view the actual parameters involved when fetching values, or just use those parameters for filtering purposes to select some subset of all VFS operations (e.g. "show me read()s and write()s to inode 12345 on /foo") ? Secondly, there's a "convention" for encoding faux multiple-dimension instance names, but it's really just a horrible hack for encoding an arbitrary tuple as a single string, like awk does. > > >> For the web-based frontend issue, yeah, javascript+svg+etc. sounds >> most promising, especially if it can be made to speak the native wire >> protocol to pmdc. It certainly could do, but for firewall and AJAX friendliness I'd vote for wrapping it in HTTP, XML-RPC style. >> This would seem to argue for a stateful >> archive-serving pmdc, or perhaps a archive-serving proxy, as in Greg's >> old project. >> > > Time averaging, aggregation and filtering were all ambitious aims > of the project Greg's talking about - I wonder if that code could > ever be resurrected and open sourced? Euurgh, dear Lord nonono :( Frank: that project didn't serve archives, it had a PMDA component which presented new metrics which were rate converted and averaged versions of existing metrics. This wasn't the best of ideas: > One abomination here was > that a PMDA could also be a client - and potentially query itself > for metrics(!) > Doing it again (the fourth time!), I would not try that particular stunt again. Instead I would abandon all attempts at building a time machine, push all the brains out to JS code in the browser, and create a very simple stateless HTTP-to-PCP protocol bridge daemon to allow PCP data to be shipped from pmcd to frontend code as either XML or JSON. Modern browsers have sufficiently fast and functional JS engines that this is now feasible. Alternately, and this is a lot more risky, I'd add rate conversion and time-averaging features to pmcd. -- Greg. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-02 2:05 ` Greg Banks @ 2010-09-02 19:40 ` Frank Ch. Eigler 0 siblings, 0 replies; 14+ messages in thread From: Frank Ch. Eigler @ 2010-09-02 19:40 UTC (permalink / raw) To: Greg Banks; +Cc: Mark Goodwin, systemtap, pcp Greg Banks <gnb@evostor.com> writes: > [...] > Firstly, do you need to view the actual parameters involved when > fetching values, or just use those parameters for filtering purposes > to select some subset of all VFS operations (e.g. "show me read()s and > write()s to inode 12345 on /foo") ? You mean whether they may be needed only for filtering control, and not for display? I'm sure it's needed for display too - else a user might not know what to filter on. > Secondly, there's a "convention" for encoding faux > multiple-dimension instance names, but it's really just a horrible > hack for encoding an arbitrary tuple as a single string, like awk > does. Yeah. OTOH if filtering needs to be done in an intermediate layer like the PMCD or PMLOG* or PMPROXY, then tuple-wide data and its operations would need to be more first-class, instead of being smuggled in a PM_TYPE_STRING. >>> For the web-based frontend issue, yeah, javascript+svg+etc. sounds >>> most promising, especially if it can be made to speak the native wire >>> protocol to pmdc. > It certainly could do, but for firewall and AJAX friendliness I'd vote > for wrapping it in HTTP, XML-RPC style. Sure; that could be tackled later / orthogonally in principle. But since modern javascript appears to lack low level socket access APIs, this may have to be done whether we like it or not. (Or go Java.) >> [...] >> Time averaging, aggregation and filtering were all ambitious aims >> of the project Greg's talking about - I wonder if that code could >> ever be resurrected and open sourced? > > Euurgh, dear Lord nonono :( > > Frank: that project didn't serve archives, it had a PMDA component > which presented new metrics which were rate converted and averaged > versions of existing metrics. This wasn't the best of ideas: I can see how one could interpret filtering in the middle as necessitating computing virtualized metrics, and that does seem complicated, but I was not trying to get into that area. > [...] Instead I would abandon all attempts at building a time > machine, push all the brains out to JS code in the browser, and > create a very simple stateless HTTP-to-PCP protocol bridge daemon to > allow PCP data to be shipped from pmcd to frontend code as either > XML or JSON. Modern browsers have sufficiently fast and functional > JS engines that this is now feasible. OK, then it looks like we'd have at least a few separate pieces to work on: * extensions to the PMCD<->PMDA API/protocol to allow PMDAs to push event data, and corresponding extensions for PMclients<->PMCD * teaching some of the existing clients to process such data * a systemtap PMDA that listens to pmStore filtering/control instructions; probably using plain type STRING for JSON payload * a PMCD<->XMLRPC bridge * the web application itself - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-08-31 19:49 ` Frank Ch. Eigler 2010-09-01 6:25 ` Mark Goodwin @ 2010-09-12 16:43 ` Ken McDonell 2010-09-13 2:21 ` Greg Banks ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: Ken McDonell @ 2010-09-12 16:43 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: pcp, systemtap Apologies for my tardiness in responding, but I'm travelling at the moment (typing this on a train and then on a ferry somewhere in Norway). On 1/09/2010 5:49 AM, Frank Ch. Eigler wrote: > Hi - > > Thanks, Nathan, Ken, Greg, Mark, for clarifying the status quo and > some of the history. > > ... > > (2) protocol extensions for live-push on pmda and pmcd-client interfaces > This clearly larger effort is only worth undertaking with the > community's sympathy and assistance. It might have some > interesting integration possibilities with the other tools, > espectially pmie (the inference engine). I'd like to go back to a comment Nathan made at the start of this thread, namely to try and get a clear idea of the problem we're trying to solve here and the typical use cases. I think it is important to get all of this on the table before we start too much of a discussion about possible evolutionary change for PCP (something I am very supportive of, in general terms). Some of the suggestions to date include ... + being able to push data from pmcd asynchronously to clients, as opposed to the time-based pulling from the clients that we support today + data filtering predicates pushed from a client to pmcd and then on to a pmda to enable or restrict the types of events or conditions on event parameters that would be evaluated before asynchronously sending matching events to the client + handling event records with associated event parameters as an extended data type + additional per-client state data being held in pmcd to allow rate aggregation (and similar temporal averaging) to be done at pmcd, rather than the client [note I have a long-standing objection to this approach based on the original design criteria that pmcd needs to be mean and lean to reduce impact, and data reduction and analysis should be pushed out to the clients where the computation can be done without impacting the system being monitored ... but maybe it is time to revist this, as the current environments where PCP is being used may differ from those we were concerned with in 1994] + better support for web-based monitoring tools (although Javascript evolution may make this less pressing that it was 5 years ago) + better support for analysis that spans the timeline between the current and the recent past This is already a long list, with the work items spanning about 2 orders of magnitude of effort. It would be good to drive towards consensus on this list of items, and then prioritizing them. Depending on the set of goals we agree on, there may even be a place to consider maintaining the poll-based mechanism, but the export data is a variable length buffer of all event records (each aggregated and self-identifying as below) seen since the last poll-based sample. Returning to Frank's point, I'm not sure pmie would be able to consume asynchronous events ... it is already a very complicated predicate engine with the notion of rules being scheduled and evaluated with fixed (but not necessarily identical) evaluation intervals for each rule. Some of the aggregation, existential, universal and percentile predicates don't have sound semantics in the presence of asynchrounous data arrival, e.g. some_inst(), all_inst(), count_sample(), etc. > For the static-pmns issue, the possibility of dynamic instance > domains, metric subspaces is probably sufficient, if the event > parameters are limited to only 1-2 degrees of freedom. (In contrast, > imagine browsing a trace of NFS or kernel VFS operations; these have > ~5 parameters.) I am not sure this is a problem. Each event has a unique timestamp, so each parameter could be encoded as a PCP metric of the appropriate type and semantics. If that is not adequate, then the best approach would seem to be to extend the base data types to include some sort of self-encoded aggregate ... I don't have a strong view on which of the existing "standards" should be adopted here, but it does not appear to be a hard problem at the PCP protocol layer ... constructing the aggregate would be PMDA (or similar) responsibility and interpretation of the aggregate is a client responsibility, although it would be more consistent with the PCP approach if the aggregate included the semantics and metadata for the parameters, even if this is only delivered once per client connection. >... On 3/09/2010 5:39 AM, Frank Ch. Eigler wrote: > ... > > OK, then it looks like we'd have at least a few separate pieces to > work on: > > * extensions to the PMCD<->PMDA API/protocol to allow PMDAs to push > event data, and corresponding extensions for PMclients<->PMCD I'd really like to see some more discussion on how people think this is going to work. None of the PCP libraries are thread-safe (again a deliberate design decision at the original point of conception), and asynchronous delivery of data from pmdas through pmcd to clients increases the likelihood that people will want to use multiple threads to handle PCP calls. There are some asynchronous calls that were grafted onto libpcp later on, but these have very little use in existing code and no QA coverage. > * teaching some of the existing clients to process such data As I mentioned above, I think we need to preserve the metadata concepts in the PCP protocols so that this data does not become opaque and only understood by the producer and consumer (one of my long-stranding complaints about SMNP and the MIB concept which PCP has so far done a better job of addressing). > * a systemtap PMDA that listens to pmStore filtering/control instructions; > probably using plain type STRING for JSON payload Currently there is no client identification (and hence no notion of a session) that is passed down from pmcd to the pmdas, so how would this filtering work in presence of multiplexed requests coming from a number of clients? And when would the fitlering stop? And is it possible that multiple clients could request filter predicates that are mutually exclusive? > * a PMCD<->XMLRPC bridge I am not sure that pmcd is the right place to put this ... if the bridge was client of pmcd, this would be more PCP-like, and match the way in which pmproxy (and to a lesser extent derived metrics) are supported. > * the web application itself I'm not a web guy, but this seems the simplest piece of the puzzle ... 8^)> I suspect the proposals here are substantive enough that they require a white paper and discussion, rather than a convoluted email thread. If I could get some feedback and answers to my questions, I'd be happy to put together an initial document to guide the discussion ... if someone else wants to drive, that's fine by me also. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-12 16:43 ` Ken McDonell @ 2010-09-13 2:21 ` Greg Banks 2010-09-13 13:29 ` Max Matveev 2010-09-13 20:39 ` Frank Ch. Eigler 2 siblings, 0 replies; 14+ messages in thread From: Greg Banks @ 2010-09-13 2:21 UTC (permalink / raw) To: Ken McDonell; +Cc: Frank Ch. Eigler, systemtap, pcp Ken McDonell wrote: > Apologies for my tardiness in responding, but I'm travelling at the > moment (typing this on a train and then on a ferry somewhere in Norway). > > > Sounds like fun :) > > > On 3/09/2010 5:39 AM, Frank Ch. Eigler wrote: > >> ... >> >> OK, then it looks like we'd have at least a few separate pieces to >> work on: >> >> * extensions to the PMCD<->PMDA API/protocol to allow PMDAs to push >> event data, and corresponding extensions for PMclients<->PMCD >> > > I'd really like to see some more discussion on how people think this is > going to work. None of the PCP libraries are thread-safe (again a > deliberate design decision at the original point of conception), I've made a brief survey of the place in the libpcp code which are not threadsafe; there's *lots* of them but most are easily fixed without breaking external interfaces. I'd estimate a few weeks' work is involved. I'm interested in helping on this for my own reasons (I'd like kmchart to be more robust when communication with pmdas is disrupted). > and > asynchronous delivery of data from pmdas through pmcd to clients > increases the likelihood that people will want to use multiple threads > to handle PCP calls. There are some asynchronous calls that were > grafted onto libpcp later on, but these have very little use in existing > code and no QA coverage. > > They're also a right bugger to program with, as we've discovered. I would be happy to see them deprecated in favour of full libpcp thread safety. -- Greg. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-12 16:43 ` Ken McDonell 2010-09-13 2:21 ` Greg Banks @ 2010-09-13 13:29 ` Max Matveev 2010-09-13 20:53 ` Ken McDonell 2010-09-13 20:39 ` Frank Ch. Eigler 2 siblings, 1 reply; 14+ messages in thread From: Max Matveev @ 2010-09-13 13:29 UTC (permalink / raw) To: Ken McDonell; +Cc: Frank Ch. Eigler, systemtap, pcp On Mon, 13 Sep 2010 02:43:03 +1000, Ken McDonell wrote: kenj> Some of the suggestions to date include ... kenj> + data filtering predicates pushed from a client to pmcd and then on to kenj> a pmda to enable or restrict the types of events or conditions on event kenj> parameters that would be evaluated before asynchronously sending kenj> matching events to the client How would that work if multiple clients request mutually exclusive predicates? kenj> + additional per-client state data being held in pmcd to allow rate kenj> aggregation (and similar temporal averaging) to be done at pmcd, rather kenj> than the client [note I have a long-standing objection to this approach kenj> based on the original design criteria that pmcd needs to be mean and kenj> lean to reduce impact, and data reduction and analysis should be pushed kenj> out to the clients where the computation can be done without impacting kenj> the system being monitored ... but maybe it is time to revist this, as kenj> the current environments where PCP is being used may differ from those kenj> we were concerned with in 1994] Recent (2010) experience on 3rd rate platform suggests that this is still an issue - doing too much calculation in pmda is adding to the time it takes to fetch the data, unless pmcd can magically hide delays induced by calculations it has to make or calculations made by pmda ALL clients suffer. kenj> Depending on the set of goals we agree on, there may even be a place to kenj> consider maintaining the poll-based mechanism, but the export data is a kenj> variable length buffer of all event records (each aggregated and kenj> self-identifying as below) seen since the last poll-based sample. How will this work with multiple clients? Will the clients get a "snap time" to indicate when pmda updated its buffer or will pmda need to remember the state of each client (which would mean dragging client information into pmda, maintaining that information and somehow retiring per-client state without affecting the clients). max ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-13 13:29 ` Max Matveev @ 2010-09-13 20:53 ` Ken McDonell 0 siblings, 0 replies; 14+ messages in thread From: Ken McDonell @ 2010-09-13 20:53 UTC (permalink / raw) To: Max Matveev; +Cc: Frank Ch. Eigler, systemtap, pcp On 13/09/2010 11:29 PM, Max Matveev wrote: > On Mon, 13 Sep 2010 02:43:03 +1000, Ken McDonell wrote: > > kenj> Some of the suggestions to date include ... > > kenj> + data filtering predicates pushed from a client to pmcd and then on to > kenj> a pmda to enable or restrict the types of events or conditions on event > kenj> parameters that would be evaluated before asynchronously sending > kenj> matching events to the client > > How would that work if multiple clients request mutually exclusive > predicates? I wonder about this also. But my initial guess would be that predicates from different clients could be combined with a boolean OR to register the union of the events of interest, and then applying individual predicates to the stream to produce the events for each client. If the underlying event mechanism cannot support this, e.g. some process based cpu event registers where only one process at a time can be traced, then "first in best dressed" is probably the only protocol that will work. > kenj> + additional per-client state data being held in pmcd to allow rate > kenj> aggregation (and similar temporal averaging) to be done at pmcd, rather > kenj> than the client [note I have a long-standing objection to this approach > kenj> based on the original design criteria that pmcd needs to be mean and > kenj> lean to reduce impact, and data reduction and analysis should be pushed > kenj> out to the clients where the computation can be done without impacting > kenj> the system being monitored ... but maybe it is time to revist this, as > kenj> the current environments where PCP is being used may differ from those > kenj> we were concerned with in 1994] > > Recent (2010) experience on 3rd rate platform suggests that this is > still an issue - doing too much calculation in pmda is adding to the > time it takes to fetch the data, unless pmcd can magically hide delays > induced by calculations it has to make or calculations made by pmda > ALL clients suffer. Yep that is always a risk ... and as a general rule I'd like to see collection being restricted at the pmcd site (e.g. don't probe for stats that no one is asking for) and aggregated processing like rate calculations moved out to the clients. > kenj> Depending on the set of goals we agree on, there may even be a place to > kenj> consider maintaining the poll-based mechanism, but the export data is a > kenj> variable length buffer of all event records (each aggregated and > kenj> self-identifying as below) seen since the last poll-based sample. > > How will this work with multiple clients? Will the clients get a "snap > time" to indicate when pmda updated its buffer or will pmda need to > remember the state of each client (which would mean dragging client > information into pmda, maintaining that information and somehow > retiring per-client state without affecting the clients). > I think it would need per-client state in the pmda (which assumes a protocol change between pmcd and the pmdas) ... the implementation might involve a dynamic ring buffer, sequence numbers and last fetched sequence number per client. The advantage is we maintain polled and synchronous protocols between the clients and pmcd ... it is exactly these sort of pros and cons that I'd like to see us discussing. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pcp] suitability of PCP for event tracing 2010-09-12 16:43 ` Ken McDonell 2010-09-13 2:21 ` Greg Banks 2010-09-13 13:29 ` Max Matveev @ 2010-09-13 20:39 ` Frank Ch. Eigler 2 siblings, 0 replies; 14+ messages in thread From: Frank Ch. Eigler @ 2010-09-13 20:39 UTC (permalink / raw) To: Ken McDonell; +Cc: pcp, systemtap Hi, Ken - > >(2) protocol extensions for live-push on pmda and pmcd-client interfaces > > This clearly larger effort is only worth undertaking with the > > community's sympathy and assistance. It might have some > > interesting integration possibilities with the other tools, > > espectially pmie (the inference engine). > > I'd like to go back to a comment Nathan made at the start of this > thread, namely to try and get a clear idea of the problem we're trying > to solve here and the typical use cases. [...] I guess the basic idea is to allow a single client tool to be able to draw & analyze both gross performance metrics, as well as the underlying events that explain those metrics. > Some of the suggestions to date include ... > > + being able to push data from pmcd asynchronously to clients, as > opposed to the time-based pulling from the clients that we support today Yes: > [later:] Depending on the set of goals we agree on, there may even > be a place to consider maintaining the poll-based mechanism, but the > export data is a variable length buffer of all event records (each > aggregated and self-identifying as below) seen since the last > poll-based sample. [...] As Max says, this would seem to require keeping some client state and buffers in pmcd and/or pmda, to avoid missing events between consecutive calls. Instead of that, I'm starting to sketch out a hybrid scheme that, on the pmapi side, is represented like this. (Please excuse the inclusion of actual code. It makes things more concrete and easier to discuss.) ------------------------------------------------------------------------ /* * Callback function from pmWatch(), supplying zero or more pmResult rows * accumulated during this pmWatch() interval. The first argument gives * number of pmResults in the second argument. The third argument is * a generic data pointer passed through from pmWatch(). * * The function should not call pmFreeResult() on the incoming values. * The function may return 0 to indicate its desire to continue watching, * or a non-zero value to abort the watch. This value will be returned * from pmWatch. */ typedef int (*pmWatchCallBack)(int resCount, const pmResult ** results, void * data); /* * Fetch metrics periodically, as if pmFetch() was called at the given * poll interval (if any). First few parameters are as for pmFetch(). * Each pmFetch() result is supplied via the given callback function. * The callback function can consume the data, and return a value * to dictate whether the polling loop is to continue or stop. * * In addition, if a PMDA pushes discrete metric updates during this * watch period, the callback function will be invoked more frequently. * (Other metric slots will have a NULL pmResult->vset[].) * * If given, approximately every poll interval, the callback function * is called (possibly with a zero resCount) to give the application a * chance to quit the loop. */ extern int pmWatch(int, pmID *, pmWatchCallBack fn, void * data, const struct timeval *pollInterval, const struct timeval *timeoutInterval); ------------------------------------------------------------------------ So a pmapi client would make a single long-duration pmWatch call to libpcp. libpcp calls back into the application periodically (to poll normal metric values) or whenever discrete events arrive. Eventually the app says "enough" by returning the appropriate rc. At the pmda.h or PDU side, I don't have a corresponding sketch yet. I wonder if we could permit multithreading just for the corresponding parts of the API: pmcd->pmda (*pmdaInterface.version.five.watch)(..., callbackFn, cbKey, ...); # pmda spawns a new thread, sets it up => key (thread-id) pmda thread2 (*callbackFn) (n, "event data pmResult" [array], cbKey, ...) pmcd->pmda (*pmdaInterface.version.five.unwatch)(key); # pmda kills thread2 => void to register an interest in metrics with the PMDA, have a new thread threads call back into PMCD only to supply new data via a dedicated function, then eventually unregister. This may require only relatively small parts of libpcp/pcp_pdma to be made thread-safe. > + data filtering predicates pushed from a client to pmcd and then on to > a pmda to enable or restrict the types of events or conditions on event > parameters that would be evaluated before asynchronously sending > matching events to the client Right. This would represent a pure performance optimization if there were only a single concurrent client. With more than one, a filtering algebra would be needed. I don't have a sketch for this yet. > + handling event records with associated event parameters as an extended > data type Right. Hiding JSON or somesuch in a string is probably OK, unless we want to reify filtering and inferencing upon them. > + additional per-client state data being held in pmcd to allow rate > aggregation (and similar temporal averaging) to be done at pmcd, rather > than the client [note I have a long-standing objection [...] I guess it depends on what we could be saving by having pmcd perform such conversions instead of clients. Client-side CPU and storage seems cheaper than network traffic, if the data reduction is moderate, but if it's high, it's the probably other way. (In the systemtap model, we encourage users to filter events aggressively at the source, which turns the data firehose into a dribble. To exploit this fully in the pcp-intermediated world though, we'd have to pass filtering parameters through.) > + better support for web-based monitoring tools (although Javascript > evolution may make this less pressing that it was 5 years ago) Right, at this point it seems like a fatter javascript app should be able to do this job without pmcd help; the web app just needs to access the pmapi (through a proxy if necessary). > + better support for analysis that spans the timeline between the > current and the recent past This sounds like useful but future work. Until it is done, we could have clients perform archive-vs-live data merging on their own, or else have the users start clients early enough to absorb the "recent past" data as live. > Returning to Frank's point, I'm not sure pmie would be able to consume > asynchronous events ... [...] That's OK, it should at worst ignore such events. At best, in the future, it could gain some more general temporal/reactive-database type facilities to do something meaningful. - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-09-13 20:53 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-27 15:39 suitability of PCP for event tracing Frank Ch. Eigler 2010-08-29 15:55 ` [pcp] " Ken McDonell 2010-09-01 15:05 ` David Smith 2010-09-06 16:39 ` Ken McDonell [not found] ` <4C7A7DFE.2040606@internode.on.net> 2010-08-31 3:29 ` Greg Banks 2010-08-31 19:49 ` Frank Ch. Eigler 2010-09-01 6:25 ` Mark Goodwin 2010-09-02 2:05 ` Greg Banks 2010-09-02 19:40 ` Frank Ch. Eigler 2010-09-12 16:43 ` Ken McDonell 2010-09-13 2:21 ` Greg Banks 2010-09-13 13:29 ` Max Matveev 2010-09-13 20:53 ` Ken McDonell 2010-09-13 20:39 ` Frank Ch. Eigler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).