public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: William Cohen <wcohen@redhat.com>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: systemtap@sources.redhat.com
Subject: Re: Proposed systemtap access to perfmon hardware
Date: Fri, 17 Mar 2006 20:26:00 -0000	[thread overview]
Message-ID: <441B1B5E.8090401@redhat.com> (raw)
In-Reply-To: <y0mslph7z6w.fsf@ton.toronto.redhat.com>

Frank Ch. Eigler wrote:
> wcohen wrote:
> 
> 
>>To try to get a feel on how the performance monitoring hardware
>>support would work in SystemTap I wrote some simple examples. 
> 
> 
> Nice work.  To flesh out the operational model (and please correct me
> if I'm wrong): the way this stuff would all work is:
> 
> - The systemtap translator would be linked with libpfm from perfmon2.
>   (libpfm license is friendly.)

The libpfm  library license is an MIT license, so it should be 
compatible with the systemtap licensing.

> - This library would be used at translation time to map perfmon.* probe
>   point specifications to PMC register descriptions (pfmlib_output_param_t).
>   (This will require telling the system the exact target cpu type for
>   cross-instrumentation.)

Yes, this complicates the cross kernel (build instrumentation on one 
system and run instrument on another). Different processors 
architectures could be used on each. Some performance monitoring systems 
such as PAPI has mappings for some generic names. This might help in 
some cases. However, there are some differences in computer architecture 
that just do not translate to the generic models

> - These descriptions would be emitted into the C code, for actual
>   installation during module initialization.  For our first cut, since
>   there appears to exist no kernel-side management API at the moment,
>   the C code would directly manipulate the PMC registers.  (This means
>   no coexistence for oprofile or other concurrent perfctr probing.
>   C'est la vie.)

Would prefer to reuse to other software to access the performance 
monitoring hardware. Don't want to generate yet another different piece 
of software that uses the performance monitoring hardware. We want 
64-bit values, but a number of the counters are much smaller than that 
(32-bit). On the pentium 4 the access to the performance counters is 
complicated and would prefer not reinventing the code to access the 
performance counters. This mechanism will only work with the global 
setup like sampling per thread would be unsupported. Also need to 
translate between the name and the event number the table in OProfile 
and perfmon are getting pretty large to keep all that information and 
catch any inabilities to map events to a register.

One advantage of generating the C code would be that it would work with 
existing RHEL4 kernel.

> - The "sample" type perfmon probes would map to the same kind of
>   dispatch/callback as the current "timer.profile": the probe handler
>   should have valid pt_regs available.

Yes, the pt_regs will be available to the sample type probe.

> - The free-running type perfmon probes, probably named
>   "perfctr.SPEC.setup" or ".start" or ".begin" would map to a one-time
>   initialization that passes a token (PMC counter number?)  to the
>   handler.  Other probe handlers can then query/manipulate the
>   free-running counter using that number via the start/stop/query
>   functions.
 >
> Is that sufficiently detailed to begin an implementation?

Pretty close. The one thing that isn't answered is the division of the 
labor for the sampling probes, onetime setup vs sample handler. Want to 
have some handle set in a global variable for the probe, but do not want 
to execute that everytime that the sample is collected. For the 
free-running probes it is pretty clear to handle the samples.

>>[...] print ("ipc is %d.%d \n", ipc/factor, ipc % factor);
> 
> 
> (An aside: we should have a more compact notation for this.  We won't
> support floating point numbers, but integers can be commonly scaled
> like this.  Maybe printf("%.Nf", value), where N implies a
> power-of-ten scaling factor, and printf("%*f", value, scale) for
> general factors.)

Yes, some scaling mechanism would be nice in some cases. The chances of 
having IPC around the value of one were pretty likely, so I put in the 
scaling to give a better picture of what is going on.

-Will

  reply	other threads:[~2006-03-17 20:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-15 16:24 William Cohen
2006-03-15 22:34 ` Frank Ch. Eigler
2006-03-17 16:20   ` William Cohen
2006-03-17 17:10     ` Bill Rugolsky Jr.
2006-03-17 17:34     ` Frank Ch. Eigler
2006-03-17 20:26       ` William Cohen [this message]
2006-03-20 17:27         ` Frank Ch. Eigler
2006-03-22  3:34 ` Maynard Johnson
2006-03-22 18:02   ` William Cohen
2006-03-22 22:16     ` Maynard Johnson
2006-03-22 18:30   ` Frank Ch. Eigler
2006-03-22 19:09 Stone, Joshua I
2006-03-22 20:04 ` Frank Ch. Eigler
2006-03-22 23:23 Stone, Joshua I
2006-03-22 23:46 Stone, Joshua I
2006-03-23 12:54 ` Maynard Johnson
2006-03-23 14:46   ` William Cohen
2006-03-23 17:09 Stone, Joshua I

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=441B1B5E.8090401@redhat.com \
    --to=wcohen@redhat.com \
    --cc=fche@redhat.com \
    --cc=systemtap@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).