public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* RE: [perfmon] Re: perfmon2 TODO list (4/4)
@ 2006-04-13 22:55 Stone, Joshua I
  2006-04-13 23:05 ` Frank Ch. Eigler
  0 siblings, 1 reply; 14+ messages in thread
From: Stone, Joshua I @ 2006-04-13 22:55 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Frank Ch. Eigler wrote:
> The translator could cross-compile for several alternative cpu
> flavours (all within an architecture, or selected ones identified by
> the user, like a "fat" executable).  The module would pick the
> appropriate PMU configuration set one during initialization time, or
> else abort.
> 
>> If we can delay this until runtime, it could still be managed in
>> user-mode via the stpd daemon [...]
> 
> That is an alternate possibility, one somewhat less preferable.

I would hate to require SystemTap to know all of the possible variations
that should be included in the fat binary - that is the point of having
libpfm in the first place.  A statement like "all within an
architecture" is still painful when dealing with the P4.  In a
"delayed-resolution" model, SystemTap can remain ignorant of
architectural differences.

Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 22:55 [perfmon] Re: perfmon2 TODO list (4/4) Stone, Joshua I
@ 2006-04-13 23:05 ` Frank Ch. Eigler
  0 siblings, 0 replies; 14+ messages in thread
From: Frank Ch. Eigler @ 2006-04-13 23:05 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

Hi -

Josh wrote:

> I would hate to require SystemTap to know all of the possible
> variations that should be included in the fat binary - that is the
> point of having libpfm in the first place.

We would still use libpfm itself, but instead of asking it to generate
PMC data for one particular CPU only, we would ask it to generate
several candidates.

> A statement like "all within an architecture" is still painful when
> dealing with the P4.

How painful do you mean?  A few dozen variants would still take only a
couple of hundred bytes of "fat" PMC data.

> In a "delayed-resolution" model, SystemTap can remain ignorant of
> architectural differences.

Downsides of having stpd resolve this stuff include the loss of
proximity to script code for purposes such as error localization,
advice heuristics; reliance on a smarter stpd precludes operation
without it (such as in the boot-time probing scenario of bug #2035);
it could preclude representation of CPU flavours to tapsets for
purposes of event name abstraction.

- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-18  8:44                   ` Stephane Eranian
@ 2006-04-18 11:41                     ` Ananth N Mavinakayanahalli
  0 siblings, 0 replies; 14+ messages in thread
From: Ananth N Mavinakayanahalli @ 2006-04-18 11:41 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: William Cohen, Frank Ch. Eigler, perfmon, systemtap

On Tue, Apr 18, 2006 at 01:38:57AM -0700, Stephane Eranian wrote:
> Will,
> 
> On Mon, Apr 17, 2006 at 09:56:57AM -0400, William Cohen wrote:
> > Stephane Eranian wrote:
> > >Frank,
> > >
> > >On Thu, Apr 13, 2006 at 06:22:51PM -0400, Frank Ch. Eigler wrote:
> > >
> > >>>I have another question related maybe more to kprobes and how the
> > >>>intercept is done: breakpoints, code rewriting. If you use
> > >>>breakpoints, then I wonder how this works in SMP machines. Do you
> > >>>intervene on each CPU?
> > >>
> > >>That's right: as each CPU trips across a breakpoint, they are made to
> > >>run our handler, then single-step across the original instruction,
> > >>then resume.  It's a multi-step process described in kprobes
> > >>documentation.  From systemtap's point of view, it's a black box.
> > >>
> > >
> > >So you are saying that kprobes takes care of programming the debug
> > >registers on all CPUs if necessary.
> > >
> > 
> > Kprobe uses breakpoint instructions, so the breakpoint registers on the 
> > processor are not currently being used. Thus, a breakpoint instruction 
> > is placed at the location where the probe is desired and the processor's 
> > debugging registers are not touch by kprobes.
> > 
> > There has been some discussion for SystemTap producing probes that use 
> > the processors debug hardware to watch for accesses to specific memory 
> > locations.
> > 
> So from what you are saying neither kprobes nor systemap uses IPI for any setup/tear
> down at this point. Is that right?

That is right. All we do is flush_icache_range() at the time of
modifiying text (registration and unregistration of a kprobe).

Ananth

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-17 13:57                 ` William Cohen
@ 2006-04-18  8:44                   ` Stephane Eranian
  2006-04-18 11:41                     ` Ananth N Mavinakayanahalli
  0 siblings, 1 reply; 14+ messages in thread
From: Stephane Eranian @ 2006-04-18  8:44 UTC (permalink / raw)
  To: William Cohen; +Cc: Frank Ch. Eigler, perfmon, systemtap

Will,

On Mon, Apr 17, 2006 at 09:56:57AM -0400, William Cohen wrote:
> Stephane Eranian wrote:
> >Frank,
> >
> >On Thu, Apr 13, 2006 at 06:22:51PM -0400, Frank Ch. Eigler wrote:
> >
> >>>I have another question related maybe more to kprobes and how the
> >>>intercept is done: breakpoints, code rewriting. If you use
> >>>breakpoints, then I wonder how this works in SMP machines. Do you
> >>>intervene on each CPU?
> >>
> >>That's right: as each CPU trips across a breakpoint, they are made to
> >>run our handler, then single-step across the original instruction,
> >>then resume.  It's a multi-step process described in kprobes
> >>documentation.  From systemtap's point of view, it's a black box.
> >>
> >
> >So you are saying that kprobes takes care of programming the debug
> >registers on all CPUs if necessary.
> >
> 
> Kprobe uses breakpoint instructions, so the breakpoint registers on the 
> processor are not currently being used. Thus, a breakpoint instruction 
> is placed at the location where the probe is desired and the processor's 
> debugging registers are not touch by kprobes.
> 
> There has been some discussion for SystemTap producing probes that use 
> the processors debug hardware to watch for accesses to specific memory 
> locations.
> 
So from what you are saying neither kprobes nor systemap uses IPI for any setup/tear
down at this point. Is that right?

-- 

-Stephane

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 22:29               ` Stephane Eranian
  2006-04-13 22:33                 ` Frank Ch. Eigler
@ 2006-04-17 13:57                 ` William Cohen
  2006-04-18  8:44                   ` Stephane Eranian
  1 sibling, 1 reply; 14+ messages in thread
From: William Cohen @ 2006-04-17 13:57 UTC (permalink / raw)
  To: eranian; +Cc: Frank Ch. Eigler, perfmon, systemtap

Stephane Eranian wrote:
> Frank,
> 
> On Thu, Apr 13, 2006 at 06:22:51PM -0400, Frank Ch. Eigler wrote:
> 
>>>I have another question related maybe more to kprobes and how the
>>>intercept is done: breakpoints, code rewriting. If you use
>>>breakpoints, then I wonder how this works in SMP machines. Do you
>>>intervene on each CPU?
>>
>>That's right: as each CPU trips across a breakpoint, they are made to
>>run our handler, then single-step across the original instruction,
>>then resume.  It's a multi-step process described in kprobes
>>documentation.  From systemtap's point of view, it's a black box.
>>
> 
> So you are saying that kprobes takes care of programming the debug
> registers on all CPUs if necessary.
> 

Kprobe uses breakpoint instructions, so the breakpoint registers on the 
processor are not currently being used. Thus, a breakpoint instruction 
is placed at the location where the probe is desired and the processor's 
debugging registers are not touch by kprobes.

There has been some discussion for SystemTap producing probes that use 
the processors debug hardware to watch for accesses to specific memory 
locations.

-Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 23:39 Stone, Joshua I
@ 2006-04-14  1:43 ` Frank Ch. Eigler
  0 siblings, 0 replies; 14+ messages in thread
From: Frank Ch. Eigler @ 2006-04-14  1:43 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

Hi -

Josh wrote:

> Can libpfm enumerate the possible variations, so we know which
> candidates to ask for?

Even if it can't do it today, we will have ample time to get it added.

> [....]  I guess the only concern I have left is making sure that the
> PMC data we have match the processor we're loading on.  [...]

Absolutely.  Once we undertake cross-compilation seriously, we will
need to do similar assertions, for even basic things like "is module X
really loaded? (and if so, where?)".

- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [perfmon] Re: perfmon2 TODO list (4/4)
@ 2006-04-13 23:39 Stone, Joshua I
  2006-04-14  1:43 ` Frank Ch. Eigler
  0 siblings, 1 reply; 14+ messages in thread
From: Stone, Joshua I @ 2006-04-13 23:39 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Frank Ch. Eigler wrote:
>> I would hate to require SystemTap to know all of the possible
>> variations that should be included in the fat binary - that is the
>> point of having libpfm in the first place.
> We would still use libpfm itself, but instead of asking it to generate
> PMC data for one particular CPU only, we would ask it to generate
> several candidates.

Can libpfm enumerate the possible variations, so we know which
candidates to ask for?

>> A statement like "all within an architecture" is still painful when
>> dealing with the P4.
> 
> How painful do you mean?  A few dozen variants would still take only a
> couple of hundred bytes of "fat" PMC data.

I mean painful from a maintenance perspective.  Somehow you have to know
which variations to ask for, and suddenly SystemTap is forced to track
all new CPU releases.

Libpfm has to do this already, so if we can get this from libpfm then
it's not so bad.

> Downsides of having stpd resolve this stuff include the loss of
> proximity to script code for purposes such as error localization,
> advice heuristics; reliance on a smarter stpd precludes operation
> without it (such as in the boot-time probing scenario of bug #2035);
> it could preclude representation of CPU flavours to tapsets for
> purposes of event name abstraction.

All good points.

I guess the only concern I have left is making sure that the PMC data we
have match the processor we're loading on.  This needs to be addressed
regardless of whether we allow multiple CPU targets.  I don't think this
is necessarily a difficult task, but it's definitely required.


Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 22:29               ` Stephane Eranian
@ 2006-04-13 22:33                 ` Frank Ch. Eigler
  2006-04-17 13:57                 ` William Cohen
  1 sibling, 0 replies; 14+ messages in thread
From: Frank Ch. Eigler @ 2006-04-13 22:33 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: perfmon, systemtap

Hi -

> > That's right: as each CPU trips across a breakpoint, they are made to
> > run our handler, then single-step across the original instruction,
> > then resume.  [...]
>
> So you are saying that kprobes takes care of programming the debug
> registers on all CPUs if necessary.

To the extent kprobes touches the debug registers at all (which it may
not), it would do so only during that single-stepping part.  It would
not help us for the PMU setup, but that's OK.  kprobes is only one of
the systemtap event sources.

- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 22:23             ` Frank Ch. Eigler
@ 2006-04-13 22:29               ` Stephane Eranian
  2006-04-13 22:33                 ` Frank Ch. Eigler
  2006-04-17 13:57                 ` William Cohen
  0 siblings, 2 replies; 14+ messages in thread
From: Stephane Eranian @ 2006-04-13 22:29 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: perfmon, systemtap

Frank,

On Thu, Apr 13, 2006 at 06:22:51PM -0400, Frank Ch. Eigler wrote:
> 
> > I have another question related maybe more to kprobes and how the
> > intercept is done: breakpoints, code rewriting. If you use
> > breakpoints, then I wonder how this works in SMP machines. Do you
> > intervene on each CPU?
> 
> That's right: as each CPU trips across a breakpoint, they are made to
> run our handler, then single-step across the original instruction,
> then resume.  It's a multi-step process described in kprobes
> documentation.  From systemtap's point of view, it's a black box.
> 
So you are saying that kprobes takes care of programming the debug
registers on all CPUs if necessary.

-- 

-Stephane

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 22:01           ` Stephane Eranian
@ 2006-04-13 22:23             ` Frank Ch. Eigler
  2006-04-13 22:29               ` Stephane Eranian
  0 siblings, 1 reply; 14+ messages in thread
From: Frank Ch. Eigler @ 2006-04-13 22:23 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: perfmon, systemtap

Hi -

On Thu, Apr 13, 2006 at 02:55:44PM -0700, Stephane Eranian wrote:
> [...]
> > >From a random kprobe handler routine, we'd only ever *read* a counter.
> This can happen at any time and anywhere in the kernel, including
> in critical sections. Is that correct? 

Yes.

> Similarly, can this happen in sections of code where virtual
> addressing is turned off?

I am not familiar enough to know, but I expect so.  To the extent that
such conditions are detectable at runtime (akin to in_interrupt()), we
can block access from such contexts at runtime, if that were required.


> > >From a sampling event interrupt, we'd also only *read* a counter.
>
> Ok. I can see you possibly reading a bunch of counters. The key is
> that on counter overflow you want to get called.

That's right.


> I have another question related maybe more to kprobes and how the
> intercept is done: breakpoints, code rewriting. If you use
> breakpoints, then I wonder how this works in SMP machines. Do you
> intervene on each CPU?

That's right: as each CPU trips across a breakpoint, they are made to
run our handler, then single-step across the original instruction,
then resume.  It's a multi-step process described in kprobes
documentation.  From systemtap's point of view, it's a black box.

> Another way to ask this: during intialize/tear down of a script, do
> you need to operate only in one CPU or you have some state to
> propagate to the other CPUs as well.

At the moment, one CPU can do all the initialization.

> For the PMU, you need to program the counters on all CPUs
> (system-wide). The current design requires that a context be created
> and bound to each CPU.

We haven't had to perform explicit multi-CPU initialization, but we
certainly could start, using IPI or a more elaborate mechanism.  Setup
is an infrequent, unhurried event.

- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 20:22         ` Frank Ch. Eigler
@ 2006-04-13 22:01           ` Stephane Eranian
  2006-04-13 22:23             ` Frank Ch. Eigler
  0 siblings, 1 reply; 14+ messages in thread
From: Stephane Eranian @ 2006-04-13 22:01 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: perfmon, systemtap

Frank,

On Thu, Apr 13, 2006 at 04:22:11PM -0400, Frank Ch. Eigler wrote:
> 
> > I need to better understand why this has to be done from kernel and
> > also under which circumstances. Is this when you insert your probe
> > code, i.e., as a result of a user action? Is this when execution
> > reaches an instrumentation point (kprobe)? [...]
> 
> There are three contexts of interactions being contemplated:
> 
> >From a random kprobe handler routine, we'd only ever *read* a counter.
> 
This can happen at any time and anywhere in the kernel, including
in critical sections. Is that correct? Similarly, can this happen
in sections of code where virtual addressing is turned off?

> >From a sampling event interrupt, we'd also only *read* a counter.
> 
Ok. I can see you  possibly reading a bunch of counters. The key is that on
counter overflow you want to get called.

> During session startup/shutdown (plain known user context), we'd only
> initialize / tear down the counters.
> 

I have another question related maybe more to kprobes and how the intercept
is done: breakpoints, code rewriting. If you use breakpoints, then I wonder
how this works in SMP machines. Do you intervene on each CPU? Another
way to ask this: during intialize/tear down of a script, do you need to operate
only in one CPU or you have some state to propagate to the other CPUs as well.

For the PMU, you need to program the counters on all CPUs (system-wide). The current
design requires that a context be created and bound to each CPU. 

Can you clarify how systemTap works in this area?

-- 
-Stephane

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
  2006-04-13 21:21 Stone, Joshua I
@ 2006-04-13 21:28 ` Frank Ch. Eigler
  0 siblings, 0 replies; 14+ messages in thread
From: Frank Ch. Eigler @ 2006-04-13 21:28 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap, perfmon

Hi -

On Thu, Apr 13, 2006 at 02:21:45PM -0700, Stone, Joshua I wrote:
> [...]
> > [...]  During systemtap script compilation, the event names
> > would be translated to low level PMU register configurations.
> 
> We need to be careful for the case of pre-compiled or cross-compiled
> modules.  If all of the translation and checks happen at compile time,
> then we'll need to make sure at init time whether we're still on a
> compatible cpu.

The translator could cross-compile for several alternative cpu
flavours (all within an architecture, or selected ones identified by
the user, like a "fat" executable).  The module would pick the
appropriate PMU configuration set one during initialization time, or
else abort.

> If we can delay this until runtime, it could still be managed in
> user-mode via the stpd daemon [...]

That is an alternate possibility, one somewhat less preferable.


- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [perfmon] Re: perfmon2 TODO list (4/4)
@ 2006-04-13 21:21 Stone, Joshua I
  2006-04-13 21:28 ` Frank Ch. Eigler
  0 siblings, 1 reply; 14+ messages in thread
From: Stone, Joshua I @ 2006-04-13 21:21 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap, perfmon, Stephane Eranian

Frank Ch. Eigler wrote:
>>>> Some options to get context setup:
>>>> 	-generate bit patterns when compiling systemtap script
>>>> 		and generate static array with information for context,
>>>> 		pmc, and pmd setup.
> 
> This is what we hope to do.  That means that we only would need the
> low-level register manipulation API in the kernel, and not the
> abstract event naming, configuration/compatibility checking, etc.
> [...]
> That is correct.  During systemtap script compilation, the event names
> would be translated to low level PMU register configurations.

We need to be careful for the case of pre-compiled or cross-compiled
modules.  If all of the translation and checks happen at compile time,
then we'll need to make sure at init time whether we're still on a
compatible cpu.

If we can delay this until runtime, it could still be managed in
user-mode via the stpd daemon, and we'd be guaranteed that the PMU
values match the current CPU.


Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [perfmon] Re: perfmon2 TODO list (4/4)
       [not found]       ` <20060413200223.GD30718@frankl.hpl.hp.com>
@ 2006-04-13 20:22         ` Frank Ch. Eigler
  2006-04-13 22:01           ` Stephane Eranian
  0 siblings, 1 reply; 14+ messages in thread
From: Frank Ch. Eigler @ 2006-04-13 20:22 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: perfmon, systemtap

Hi -

On Thu, Apr 13, 2006 at 01:02:23PM -0700, Stephane Eranian wrote:

> [...]
> > I think the simplest approach would be to provide the current
> > syscall API set as an in-kernel API set as well.  [...]

> I need to better understand why this has to be done from kernel and
> also under which circumstances. Is this when you insert your probe
> code, i.e., as a result of a user action? Is this when execution
> reaches an instrumentation point (kprobe)? [...]

There are three contexts of interactions being contemplated:

From a random kprobe handler routine, we'd only ever *read* a counter.

From a sampling event interrupt, we'd also only *read* a counter.

During session startup/shutdown (plain known user context), we'd only
initialize / tear down the counters.


> [...]  I can more easily see acess from a per-CPU poin of view
> reather than per-thread.

Reading "current thread" values would also be useful, if the
management code was already tracking that state (context-switching
it).

> [...]
> System-wide is across all threads running on a CPU by definition.

(We hope to address reasonable configurations of SMP data gathering also.)


> > > Some options to get context setup:
> > > 	-generate bit patterns when compiling systemtap script
> > > 		and generate static array with information for context,
> > > 		pmc, and pmd setup.

This is what we hope to do.  That means that we only would need the
low-level register manipulation API in the kernel, and not the
abstract event naming, configuration/compatibility checking, etc.

> > This approach might relieve us from having to provide the in-kernel API 
> > discussed above.
>
> I looked at Will's proposal at:
> 	http://sources.redhat.com/ml/systemtap/2006-q1/msg00800.html
> There is something confusing about this proposal. I don't know
> at which level this interface operates:  kernel or user? [...]

My followup to that message will clarify this.


> In general I do not think that passing event names to the kernel is
> a good idea.  [...]  This is something better done in a user
> library. But maybe you meant the string contains the value and the
> PMU register to program instead of the logical event name.

That is correct.  During systemtap script compilation, the event names
would be translated to low level PMU register configurations.

- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-04-18 11:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-13 22:55 [perfmon] Re: perfmon2 TODO list (4/4) Stone, Joshua I
2006-04-13 23:05 ` Frank Ch. Eigler
  -- strict thread matches above, loose matches on Subject: below --
2006-04-13 23:39 Stone, Joshua I
2006-04-14  1:43 ` Frank Ch. Eigler
2006-04-13 21:21 Stone, Joshua I
2006-04-13 21:28 ` Frank Ch. Eigler
     [not found] <20060412215747.GJ29245@frankl.hpl.hp.com>
     [not found] ` <20060412220659.GL29245@frankl.hpl.hp.com>
     [not found]   ` <20060412221256.GM29245@frankl.hpl.hp.com>
     [not found]     ` <200604131201.59232.kevcorry@us.ibm.com>
     [not found]       ` <20060413200223.GD30718@frankl.hpl.hp.com>
2006-04-13 20:22         ` Frank Ch. Eigler
2006-04-13 22:01           ` Stephane Eranian
2006-04-13 22:23             ` Frank Ch. Eigler
2006-04-13 22:29               ` Stephane Eranian
2006-04-13 22:33                 ` Frank Ch. Eigler
2006-04-17 13:57                 ` William Cohen
2006-04-18  8:44                   ` Stephane Eranian
2006-04-18 11:41                     ` Ananth N Mavinakayanahalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).