From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17541 invoked by alias); 15 Mar 2006 16:24:57 -0000 Received: (qmail 17533 invoked by uid 22791); 15 Mar 2006 16:24:56 -0000 X-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 15 Mar 2006 16:24:55 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k2FGOr5s003495 for ; Wed, 15 Mar 2006 11:24:53 -0500 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [172.16.52.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id k2FGOm122177 for ; Wed, 15 Mar 2006 11:24:48 -0500 Received: from [172.16.59.162] (dhcp59-162.rdu.redhat.com [172.16.59.162]) by pobox.corp.redhat.com (8.12.8/8.12.8) with ESMTP id k2FGOm6a009479 for ; Wed, 15 Mar 2006 11:24:48 -0500 Message-ID: <44183FCF.6010809@redhat.com> Date: Wed, 15 Mar 2006 16:24:00 -0000 From: William Cohen User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: SystemTAP Subject: Proposed systemtap access to perfmon hardware Content-Type: multipart/mixed; boundary="------------080905070101010906070802" X-Virus-Checked: Checked by ClamAV on sourceware.org X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2006-q1/txt/msg00800.txt.bz2 This is a multi-part message in MIME format. --------------080905070101010906070802 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-length: 201 I have written up material describing how I would think that systemtap could use the performance monitoring hardware. It is a work in progress, but I would appreciate people's comments on it. -Will --------------080905070101010906070802 Content-Type: text/plain; name="stapperfmon.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="stapperfmon.txt" Content-length: 5928 Systemtap Performance Monitoring Hardware Support Proposal March 15, 2006 Most modern processors have performance monitoring hardware that can count event such as processor clock cycles, memory references, cache misses, branches, and branch mispredictions. The hardware counts can be used directly to guage the cost of operations or the counts can be used to trigger sampling to find out where these operations occur in code. SystemTap should have the ability to uses this performance monitoring hardware to indicate what the underlying causes of the performance problems are. SYSTEMTAP PERFORMANCE MONITORING API perfmon_allocate_counter:long (event_spec:string) All the perfmon_allocate_counter() calls must be in the probe begin (removing this restrictions will be considered later). A string as specified in the EVENT SPECIFICATION section describes the event performance counter configuration. If the configuration is sucessful a even_handle in the form of a non-zero 64-bit value will be returned. A zero value indicates that there was a problem with the counter allocation. This event_handle will be used by other functions to uniquely identify the counter being used. The counters are not set up or running until the perfmon_create_context is performed. perfmon_free_counter:long (event_handle:long) All perfmon_free_counter() calls must be in the probe end (removing this restrictions will be considered later). The function returns the event_handle for a successful free operation and zero for an unsuccessful operation. perfmon_create_context:long () The perfmon_create_context command sets up the performance monitoring hardware for the allocated contexts and starts the counters running. If successful, the function will return zero. If the operation is unsuccessful because an error code will be returned. This function should only be used in probe begin. (FIXME list error code returned.) perfmon_get_counter:long (event_handle:long) The event_handle passed in indicates which counter to read. The value is returned as a 64-bit long of the current counter value; the counter could be either running or stopped. The return value is undefined for an invalid event_handle. perfmon_start_counter:long (event_handle:long) The event_handle passed in indicates which counter to start. The value is returned as a 64-bit long of the current counter value. The return value is undefined for an invalid event_handle. perfmon_stop_counter:long (event_handle:long) The event_handle passed in indicates which counter to stop. The value is returned as a 64-bit long of the current counter value. The return value is undefined for an invalid event_handle. perfmon_handle_to_string:string (event_handle:long) The perfmon_handle_to_string operation returns the string used by the perfmon_allocate_counter to generate the handle. probe kernel.perfmon.sample(event_handle:long) {/*body*/} The kernel.perfmon.sample probe indicates the action to implement when the counter specified by event_handle overflows. This could be triggered at anytime, so the context information is limited to the same data available for an asynchronous timer probe. The event_handle is a global variable in the instrumentation script. Multiple probes for a particular global variable is allowed. EVENT SPECIFICATION The performance monitoring events are specified in strings. The information at the very least include the event name being monitored by the counter. Additional information would include a event mask to specify subevents, whether to count in kernel or user space, whether to keep track of counts on a per thread or per CPU basis, and the interval for the sampling. (FIXME more detail on the string compositions) SYSTEMTAP PERFORMANCE HARDWARE ACCESS IMPLEMENTATION The SystemTap access performance monitoring hardware is planned to be built on the perfmon2 kernel support. The perfom2 provides reservation and access to the performance monitoring hardware on ia64, i386, and PowerPC processors. The perfmon2 support is not yet in the upstream kernels, but patches are available. Outline where things are done. In Translator: group all probe kernel.perfmon.sample() together In perfmon tapset: perfmon_allocate_counter() perfmon_free_counter() perfmon_create_context() perfmon_get_counter() perfmon_start_counter() perfmon_stop_counter() perfmon_handle_to_string() On startup (probe begin): if perfmon.sample used, register perfmon custom buffer mechanism The following steps will need some work done in userspace (libpfm): -translate each of the perfmon_allocate_counter into perfmon config -set up the perfmon contexts (either per processor or per pid) -activate the perfmon contexts On shutdown (probe end): The following steps will need some work done in userspace (libpfm): -destroy the perfmon contexts -if perfmon.sample used, unregister perfmon custom buffer mechanism FIXME more details on the proposed implementation. SYSTEMTAP PERFMON ISSUES -There are numerous constraints on event setup. It is possible to request a configuration that cannot be set up in the performance monitoring hardware. -This mechanism does not provide access to other related information provided by the performance monitoring hardware, e.g. the performance monitoring registers storing the data address tha caused a cache miss on ia64. -The perfmon clones the context for new threads that have the perfmon context set up, but we probably do not want to attach to each existing thread and set up the context on it. That is going to be relatively expensive. -Perfmon can either do global or per thread monitoring, but they cannot be mixed. REFERENCES Stephane Eranian, The perfmon2 interface specification HP Laboratories, HPL-2004-200(R.1), February 7, 2005. http://www.hpl.hp.com/techreports/2004/HPL-2004-200R1.html --------------080905070101010906070802--