Re: Proposed systemtap access to perfmon hardware

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

From: Maynard Johnson <maynardj@us.ibm.com>
To: William Cohen <wcohen@redhat.com>
Cc: SystemTAP <systemtap@sources.redhat.com>
Subject: Re: Proposed systemtap access to perfmon hardware
Date: Wed, 22 Mar 2006 22:16:00 -0000	[thread overview]
Message-ID: <4421CCAA.4040501@us.ibm.com> (raw)
In-Reply-To: <442190D3.7060204@redhat.com>

William Cohen wrote:
> Maynard Johnson wrote:
> 
>> William Cohen wrote:
>>
>>> [snip]
>>>
>>> perfmon_create_context:long ()
>>>
>>> The perfmon_create_context command sets up the performance monitoring
>>> hardware for the allocated contexts and starts the counters running.
>>> If successful, the function will return zero. If the operation is
>>> unsuccessful because an error code will be returned. This function
>>> should only be used in probe begin. (FIXME list error code returned.)
>>>  
>>>
>> I'm confused about the relationship between this function and 
>> perfmon_start_counter, since starting the counters is mentioned in 
>> both.  Could you explain at what point this function is invoked and 
>> what the purpose of the context is?  I'm not real familiar with the 
>> perfmon2 interface, but just on the face of it, your context doesn't 
>> seem like a one-to-one fit with the way contexts are used in 
>> perfmon2.  In perfmon2, a context is created first, which is then 
>> passed in to the calls for setting up events, thereby associating 
>> those events with the context. Then 'start' uses the context to set up 
>> the PMU for all requested events and begin the counting.
> 
> 
> Yes, perfmon2 has a contexts that sets all the performance monitoring 
> hardware registers. The perfmon2 start and stop control the entire context.
> 
> Based on the feedback from earlier proposal email, revised to using 
> something like:
> 
> probe perfmon.event("blah") ...
> 
> All the probes using the perfmon hardware would be collected together 
> for the perfmon_create_context. 
This is good.
> The individual start and stop operations would be allowed. 
This is not so good.  Besides the fact that it may be difficult (or 
impossible) to do, I don't see it being all that useful.  But then, I'm 
a tool developer, not a performance analyst, so I could be missing the 
point.

 > It is and open question what the counters default are;
> do they start running by default or have to be explicitly started. If 
> they are started by default, where exactly are they running? Beginning 
> of begin probe? End of begin probe?
> 
>>>
>>> [snip]
>>>
>>> perfmon_start_counter:long (event_handle:long)
>>>
>>> The event_handle passed in indicates which counter to start. The value
>>> is returned as a 64-bit long of the current counter value.  The return
>>> value is undefined for an invalid event_handle.
>>>  
>>>
>> I think individually starting counters is problematic at a couple 
>> different levels.  On some architectures (like PowerPC64), you don't 
>> have fine-grained control over each counter.  Also, one usually wants 
>> all counters to begin counting at the same time.  Maybe I'm 
>> misinterpreting what the intention of this function is.
> 
> 
> I was thinking there are cases where one would want to start and stop 
> individual sampling and interval counting. Yes, starting and stoping 
> counters on some architectures can be a problem.  I was thinking if 
> cheating and not actually starting and stopping the counters, but rather 
> turning on and off the bits that enabling counting in user and kernel 
> space. Do this by finding which bits to twiddle in the control register. 
Unfortunately, this isn't possible for ppc64.  The control bits you 
mention (for user/kernel domain) are used for all counters, so there's 
no fine-grained control there.  There are PMCxSEL bits for setting up 
each counter for what you want it to count (including "count nothing"), 
but changing these on the fly (i.e., without disabling the PMU) may not 
have the desired effect.  The documentation states that you should first 
disable the PMU before you change these bits, but it doesn't say what 
would happen if you didn't disable.

-Maynard
> However, maybe this won't work for ppc64. I will have to review the 
> ppc64 hardware manual to see that this scheme would work.
> 
>>> [snip]
>>>
>>
>>> EVENT SPECIFICATION
>>>
>>> The performance monitoring events are specified in strings. The
>>> information at the very least include the event name being monitored
>>>  
>>>
>> Will, you allude to this in a later posting, but I'll reiterate here.  
>> Should the event name be the native event name for the arch?  Or some 
>> generic name that is mapped to a native name by some mechanism?  Or 
>> either (as in PAPI)?
> 
> 
> libpfm has some generic names for cycle counts. I expect that events 
> will be both generic names and architecture specific. This will be a 
> lookup in libpfm.
> 
>>> by the counter.  Additional information would include a event mask to
>>> specify subevents, whether to count in kernel or user space, whether
>>> to keep track of counts on a per thread or per CPU basis, and the
>>> interval for the sampling.
>>>
>>> (FIXME more detail on the string compositions)
>>>
>>>
>>> SYSTEMTAP PERFORMANCE HARDWARE ACCESS IMPLEMENTATION
>>>
>>> The SystemTap access performance monitoring hardware is planned to be
>>> built on the perfmon2 kernel support. The perfom2 provides reservation
>>> and access to the performance monitoring hardware on ia64, i386, and
>>> PowerPC processors. The perfmon2 support is not yet in the upstream
>>> kernels, but patches are available.
>>>  
>>>
>> As a proof of concept, I agree that this is the best route.  
>> Reinventing the wheel would be useless.  Maybe building this prototype 
>> might help with refining the perfmon2 interface.
> 
> 
> I have been working on patching oprofile so that it uses the perfmon2 
> interface. The work is being done on an amd64 machine. This should allow 
> some examination of the mechanisms for setting up the events and 
> sampling. It should be portable to perfmon2 for i386, ppc64, and ia64. I 
> will make the patches available for comment.
> 
> Next step would be to protoype similar opertation for systemtap.
> 
> I am trying to avoid reinventing the wheel. I am also very concerned 
> that raw access of the performance monitoring hardware will further 
> increase the chances of multiple device drivers stepping on each other 
> without knowing about it.
> 
> -Will

next prev parent reply	other threads:[~2006-03-22 22:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-15 16:24 William Cohen
2006-03-15 22:34 ` Frank Ch. Eigler
2006-03-17 16:20   ` William Cohen
2006-03-17 17:10     ` Bill Rugolsky Jr.
2006-03-17 17:34     ` Frank Ch. Eigler
2006-03-17 20:26       ` William Cohen
2006-03-20 17:27         ` Frank Ch. Eigler
2006-03-22  3:34 ` Maynard Johnson
2006-03-22 18:02   ` William Cohen
2006-03-22 22:16     ` Maynard Johnson [this message]
2006-03-22 18:30   ` Frank Ch. Eigler
2006-03-22 19:09 Stone, Joshua I
2006-03-22 20:04 ` Frank Ch. Eigler
2006-03-22 23:23 Stone, Joshua I
2006-03-22 23:46 Stone, Joshua I
2006-03-23 12:54 ` Maynard Johnson
2006-03-23 14:46   ` William Cohen
2006-03-23 17:09 Stone, Joshua I

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4421CCAA.4040501@us.ibm.com \
    --to=maynardj@us.ibm.com \
    --cc=systemtap@sources.redhat.com \
    --cc=wcohen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).