From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 464 invoked by alias); 17 Mar 2006 20:26:12 -0000 Received: (qmail 455 invoked by uid 22791); 17 Mar 2006 20:26:11 -0000 X-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 17 Mar 2006 20:26:09 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k2HKQ7FC013964 for ; Fri, 17 Mar 2006 15:26:07 -0500 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [172.16.52.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id k2HKQ7125060; Fri, 17 Mar 2006 15:26:07 -0500 Received: from [172.16.50.108] (vpn50-108.rdu.redhat.com [172.16.50.108]) by pobox.corp.redhat.com (8.12.8/8.12.8) with ESMTP id k2HKQ7Mp005852; Fri, 17 Mar 2006 15:26:07 -0500 Message-ID: <441B1B5E.8090401@redhat.com> Date: Fri, 17 Mar 2006 20:26:00 -0000 From: William Cohen User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Frank Ch. Eigler" CC: systemtap@sources.redhat.com Subject: Re: Proposed systemtap access to perfmon hardware References: <44183FCF.6010809@redhat.com> <441AE1DE.2040207@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2006-q1/txt/msg00824.txt.bz2 Frank Ch. Eigler wrote: > wcohen wrote: > > >>To try to get a feel on how the performance monitoring hardware >>support would work in SystemTap I wrote some simple examples. > > > Nice work. To flesh out the operational model (and please correct me > if I'm wrong): the way this stuff would all work is: > > - The systemtap translator would be linked with libpfm from perfmon2. > (libpfm license is friendly.) The libpfm library license is an MIT license, so it should be compatible with the systemtap licensing. > - This library would be used at translation time to map perfmon.* probe > point specifications to PMC register descriptions (pfmlib_output_param_t). > (This will require telling the system the exact target cpu type for > cross-instrumentation.) Yes, this complicates the cross kernel (build instrumentation on one system and run instrument on another). Different processors architectures could be used on each. Some performance monitoring systems such as PAPI has mappings for some generic names. This might help in some cases. However, there are some differences in computer architecture that just do not translate to the generic models > - These descriptions would be emitted into the C code, for actual > installation during module initialization. For our first cut, since > there appears to exist no kernel-side management API at the moment, > the C code would directly manipulate the PMC registers. (This means > no coexistence for oprofile or other concurrent perfctr probing. > C'est la vie.) Would prefer to reuse to other software to access the performance monitoring hardware. Don't want to generate yet another different piece of software that uses the performance monitoring hardware. We want 64-bit values, but a number of the counters are much smaller than that (32-bit). On the pentium 4 the access to the performance counters is complicated and would prefer not reinventing the code to access the performance counters. This mechanism will only work with the global setup like sampling per thread would be unsupported. Also need to translate between the name and the event number the table in OProfile and perfmon are getting pretty large to keep all that information and catch any inabilities to map events to a register. One advantage of generating the C code would be that it would work with existing RHEL4 kernel. > - The "sample" type perfmon probes would map to the same kind of > dispatch/callback as the current "timer.profile": the probe handler > should have valid pt_regs available. Yes, the pt_regs will be available to the sample type probe. > - The free-running type perfmon probes, probably named > "perfctr.SPEC.setup" or ".start" or ".begin" would map to a one-time > initialization that passes a token (PMC counter number?) to the > handler. Other probe handlers can then query/manipulate the > free-running counter using that number via the start/stop/query > functions. > > Is that sufficiently detailed to begin an implementation? Pretty close. The one thing that isn't answered is the division of the labor for the sampling probes, onetime setup vs sample handler. Want to have some handle set in a global variable for the probe, but do not want to execute that everytime that the sample is collected. For the free-running probes it is pretty clear to handle the samples. >>[...] print ("ipc is %d.%d \n", ipc/factor, ipc % factor); > > > (An aside: we should have a more compact notation for this. We won't > support floating point numbers, but integers can be commonly scaled > like this. Maybe printf("%.Nf", value), where N implies a > power-of-ten scaling factor, and printf("%*f", value, scale) for > general factors.) Yes, some scaling mechanism would be nice in some cases. The chances of having IPC around the value of one were pretty likely, so I put in the scaling to give a better picture of what is going on. -Will