proposed instruction trace support in SystemTap

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* proposed instruction trace support in SystemTap
@ 2007-07-02 23:01 Dave Nomura
  2007-07-05 19:37 ` Frank Ch. Eigler
  2007-07-06 21:39 ` Maynard Johnson
  0 siblings, 2 replies; 27+ messages in thread
From: Dave Nomura @ 2007-07-02 23:01 UTC (permalink / raw)
  To: systemtap

PROPOSED INSTRUCTION TRACING INTEGRATION INTO SYSTEMTAP
A user would write a stap script that identified where to turn on 
instruction tracing and where to turn it off.  The stap script could be 
invoked via: stap <user_script>.stp -b -M -c "<some program>" -o 
<trace_data>
to allow the stap script to access the pid to trace, although other 
usage modes are also possible.  Some enhancements to the stap translator 
are suggested below to support instruction tracing.

SINGLE_STEP/BRANCH TRAP HANDLER
The user's stap script would also need to define an instruction trace 
handler and insert their own body for the handler.  This might look like:
probe single_step label("single_step handler 1")
{
        <do whatever you want for each single stepped instruction>
        itrace_output();        // write itrace binary output
}

probe branch_step label("branch handler 1")
{
        <do whatever you want for each branch instruction>
        itrace_output();        // write itrace binary output
}

where "label" is an language extension to attach a name to a instruction 
trace probe that would allow you to have different instruction trace 
handlers for different instruction trace probes.  There would only be 
one single_step trap handler but it would use the label to decide which 
code to execute from the user's stap script.

The itrace_output() is a function that produces the raw trace data that 
could then be post processed for consumption by various performance 
analysis tools but the user could do something as simple as printing out 
the PC value.  It might be nice if there was some way to name the relay 
streams so that they aren't intermingled.  Maybe something analogous to 
the stream parameter to fprintf.

The SystemTap translator would generate calls to target dependent code 
to implement single instruction or branch trapping.  This is done a 
variety of ways on different architectures, but generally involves 
setting a bit in a system register to enable single instruction/branch 
trapping.

TURNING ON/OFF TRACING
The user's stap script would turn on/off instruction tracing by creating 
a uprobe containing a call to a SystemTap itrace tapset function for 
turning on/off instruction tracing.  I don't know what SystemTap's 
uprobe interface will look like but it might be something like:
probe process(target()).function(
"function_to_trace")
{
        itrace_on_pid("single step handler 1", pid())
}

probe process(target()).function("function_to_trace").return
{
        itrace_off(pid())
}

Note:
- instruction tracing enabled for a parent process id will enable 
tracing for all of its children (threads). Since uprobes are on a 
per-process basis rather than per-thread, instruction tracing would be 
constrained to the same semantics, although it would be possible for a 
user to write their single step handler to treat some threads individually.
- The "single step handler 1" parameter is the label attached to the 
instruction trace handler above to allow one to have different handlers 
for different instruction trace probes.

INITIALIZATION/CLEANUP
Initialization/cleanup of the instruction tracing feature could be done 
by insertion of a call to an itrace initialilzation/cleanup routine in 
the user's begin/end probes.

probe begin
{
        itrace_init(<some params>)
}

probe end
{
        itrace_cleanup()
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-02 23:01 proposed instruction trace support in SystemTap Dave Nomura
@ 2007-07-05 19:37 ` Frank Ch. Eigler
  2007-07-06 12:46   ` grundy
                     ` (2 more replies)
  2007-07-06 21:39 ` Maynard Johnson
  1 sibling, 3 replies; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-07-05 19:37 UTC (permalink / raw)
  To: dcnltc; +Cc: systemtap

Dave Nomura <dcnltc@us.ibm.com> writes:

> [...]

Thanks for continuing with this idea.

> SINGLE_STEP/BRANCH TRAP HANDLER
> [...]
> probe branch_step label("branch handler 1")
> {
>         <do whatever you want for each branch instruction>
>         itrace_output();        // write itrace binary output
> }
> 
> where "label" is an language extension to attach a name [...]

Particularly, to turn the probe on and off by explicit function calls.
This is an area we discussed at the face-to-face meeting in Ottawa
last week, in relation to user-space probes.  The same concept could
apply to other probe types.

Regarding semantics, this is tricky business.  Turning off active
probes is relatively simple, because even if the underlying probe API
doesn't support instantaneous (atomic) disarming, we can simulate it
until the API catches up (by adding an "am I supposed to be disarmed?"
conditional to the handler).  Turning them *on* is different - we
can't help but possibly miss a couple of events as the API catches up.

Maybe this is acceptable, maybe not.  Some syntax may help tell us the
judgement of the script programmer.

Regarding syntax, we have more options than an opaque string and
explicit function calls to turn things on and off.  We could have a
guard expression like dtrace's /.../ - though we would probably just
spell it thusly:

    probe PROBEPOINT if (expr) { }

where expr could be something as simple as (probe_1_enabled_p), which
better be a global variable.

The compiler would analyze expr for dataflow, arrange to evaluate this
condition whenever appropriate (after another probe writes any of its
inputs), and arrange to promptly activate or deactivate the
appropriate probes.  Since "promptly" may take some time, script
programmers plopping a conditional like this in are implying consent
to a few events being missed.

> The itrace_output() is a function that produces the raw trace data
> that could then be post processed for consumption by various
> performance analysis tools but the user could do something as simple
> as printing out the PC value.

Is the "raw trace data" a well-defined thing?  Why would this sort of
hard-coded data set be desirable, as opposed to letting the programmer
write something explicit like:
   printf("%2b%8b%4b", cpu(), get_ticks(), pc())
(Of course this can be hidden in a function of his own, or in an
inspectable tapset.)

> It might be nice if there was some way to name the relay streams so
> that they aren't intermingled.  Maybe something analogous to the
> stream parameter to fprintf.

Something similar was mentioned as desirable in the OLS2007 talk by
Bligh / Desnoyer on google's ktrace & lttng.  There, the context was
an occasional need to have separate buffers for high-volume and
low-volume messages, so that buffer overflows did not penalize the
smaller messages too much.  Let's think about this some more.

> The SystemTap translator would generate calls to target dependent
> code to implement single instruction or branch trapping.  This is
> done a variety of ways on different architectures, but generally
> involves setting a bit in a system register to enable single
> instruction/branch trapping.

Is this sort of thing done/doable in kernel space also, or just on
user-space threads?  Is there an existing kernel API for management of
these registers/fields?

> [...] - instruction tracing enabled for a parent process id will
> enable tracing for all of its children (threads). [...]

This is a sensible behavior, though so is a per-thread alternative.
Since the tracing flags are per-thread control registers anyway,
I suspect we'll have to build the former on top of the latter.

> [...] INITIALIZATION/CLEANUP
> Initialization/cleanup of the instruction tracing feature could be
> done by insertion of a call to an itrace initialilzation/cleanup
> routine in the user's begin/end probes.
> 
> probe begin
>         itrace_init(<some params>)
> probe end
>         itrace_cleanup()

Neither of these should be necessary.  The existence of
instruction-trace type probes should imply automated setup/cleanup.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-05 19:37 ` Frank Ch. Eigler
@ 2007-07-06 12:46   ` grundy
  2007-07-06 14:59     ` Frank Ch. Eigler
  2007-07-06 21:43   ` Maynard Johnson
  2007-07-10 14:12   ` Dave Nomura
  2 siblings, 1 reply; 27+ messages in thread
From: grundy @ 2007-07-06 12:46 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: dcnltc, systemtap

On Thu, Jul 05, 2007 at 03:37:02PM -0400, Frank Ch. Eigler wrote:
> Particularly, to turn the probe on and off by explicit function calls.
> <snip> 
> Regarding semantics, this is tricky business.  Turning off active
> probes is relatively simple, because even if the underlying probe API
> doesn't support instantaneous (atomic) disarming, we can simulate it
> until the API catches up (by adding an "am I supposed to be disarmed?"
> conditional to the handler).  Turning them *on* is different - we
> can't help but possibly miss a couple of events as the API catches up.

Maybe we could support two levels of disarmed? One would be probepoints
removed as discussed, the other could be probe points in, complete
handler not firing (just enough to say "we're not active" and return).

In tapscripts what I will do sometimes is have an active variable that
gets checked at the beginning of every handler. When the trigger is hit
to start recording, the variable is changed and recording begins. It
would be good to have a way to be sure that when you activate a set of
probes, that they are actually active and not on the way to being
active.

Thanks
Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-06 12:46   ` grundy
@ 2007-07-06 14:59     ` Frank Ch. Eigler
  0 siblings, 0 replies; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-07-06 14:59 UTC (permalink / raw)
  To: systemtap

Hi -

On Fri, Jul 06, 2007 at 08:46:19AM -0400, grundy wrote:

> [...]  Maybe we could support two levels of disarmed? One would be
> probepoints removed as discussed, the other could be probe points
> in, complete handler not firing (just enough to say "we're not
> active" and return).

The latter is essentially the same as having an "if" conditional
within the probe handler - after all, we're already paying the price
of dispatching the exception.  What I'm proposing is that we allow it
*outside* too, to represent the former case.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-02 23:01 proposed instruction trace support in SystemTap Dave Nomura
  2007-07-05 19:37 ` Frank Ch. Eigler
@ 2007-07-06 21:39 ` Maynard Johnson
  1 sibling, 0 replies; 27+ messages in thread
From: Maynard Johnson @ 2007-07-06 21:39 UTC (permalink / raw)
  To: dcnltc; +Cc: systemtap

Dave Nomura wrote:
> PROPOSED INSTRUCTION TRACING INTEGRATION INTO SYSTEMTAP
> A user would write a stap script that identified where to turn on 
> instruction tracing and where to turn it off.  The stap script could be 
> invoked via: stap <user_script>.stp -b -M -c "<some program>" -o 
> <trace_data>
Ideally, some default behavior should be provided by new instruction 
trace scripts so that a user may not need to code any stap scripts 
themselves; i.e., user passes in program name, args, start-trace 
function, stop-trace function.
> to allow the stap script to access the pid to trace, although other 
> usage modes are also possible.  Some enhancements to the stap translator 
> are suggested below to support instruction tracing.
> 
> SINGLE_STEP/BRANCH TRAP HANDLER
> The user's stap script would also need to define an instruction trace 
A default handler should be provided, but with some mechanism to allow 
the user to override.
> handler and insert their own body for the handler.  This might look like:
> probe single_step label("single_step handler 1")
> {
>        <do whatever you want for each single stepped instruction>
>        itrace_output();        // write itrace binary output
> }
> 
> probe branch_step label("branch handler 1")
> {
>        <do whatever you want for each branch instruction>
>        itrace_output();        // write itrace binary output
> }
> 
[snip]

-Maynard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-05 19:37 ` Frank Ch. Eigler
  2007-07-06 12:46   ` grundy
@ 2007-07-06 21:43   ` Maynard Johnson
  2007-07-07  1:58     ` Frank Ch. Eigler
  2007-07-10 14:12   ` Dave Nomura
  2 siblings, 1 reply; 27+ messages in thread
From: Maynard Johnson @ 2007-07-06 21:43 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: dcnltc, systemtap

Frank,
I work with Dave, and I told him I would cover for him on this issue 
while he's away from the office for a bit.  I'm very familiar with the 
existing ITrace tool (from the OSS "Performance Inspector" project) that 
our team contributes to, but not so much when it comes to SystemTap. 
But I'll give my two cents worth . . .


Frank Ch. Eigler wrote:
> Dave Nomura <dcnltc@us.ibm.com> writes:
> 
> 
>>[...]
> 
> 
> Thanks for continuing with this idea.
> 
> 
>>SINGLE_STEP/BRANCH TRAP HANDLER
>>[...]
>>probe branch_step label("branch handler 1")
>>{
>>        <do whatever you want for each branch instruction>
>>        itrace_output();        // write itrace binary output
>>}
>>
>>where "label" is an language extension to attach a name [...]
> 
> 
> Particularly, to turn the probe on and off by explicit function calls.
> This is an area we discussed at the face-to-face meeting in Ottawa
> last week, in relation to user-space probes.  The same concept could
> apply to other probe types.
> 
> Regarding semantics, this is tricky business.  Turning off active
> probes is relatively simple, because even if the underlying probe API
> doesn't support instantaneous (atomic) disarming, we can simulate it
> until the API catches up (by adding an "am I supposed to be disarmed?"
> conditional to the handler).  Turning them *on* is different - we
> can't help but possibly miss a couple of events as the API catches up.
> 
> Maybe this is acceptable, maybe not.  Some syntax may help tell us the
> judgement of the script programmer.
I don't believe users would even be aware of this gap.
> 
> 
> Regarding syntax, we have more options than an opaque string and
> explicit function calls to turn things on and off.  We could have a
> guard expression like dtrace's /.../ - though we would probably just
> spell it thusly:
> 
>     probe PROBEPOINT if (expr) { }
> 
> where expr could be something as simple as (probe_1_enabled_p), which
> better be a global variable.
Yes, I think this construct could be very useful.
> 
> The compiler would analyze expr for dataflow, arrange to evaluate this
> condition whenever appropriate (after another probe writes any of its
> inputs), and arrange to promptly activate or deactivate the
> appropriate probes.  Since "promptly" may take some time, script
> programmers plopping a conditional like this in are implying consent
> to a few events being missed.
> 
> 
> 
>>The itrace_output() is a function that produces the raw trace data
>>that could then be post processed for consumption by various
>>performance analysis tools but the user could do something as simple
>>as printing out the PC value.
> 
> 
> Is the "raw trace data" a well-defined thing?  Why would this sort of
> hard-coded data set be desirable, as opposed to letting the programmer
> write something explicit like:
>    printf("%2b%8b%4b", cpu(), get_ticks(), pc())
> (Of course this can be hidden in a function of his own, or in an
> inspectable tapset.)
In fact, the raw trace data is well-defined by the existing ITrace tool 
I mentioned above.  Of course, this definition is negotiable.  The idea 
behind this is to provide enough information in the raw trace data so 
that, for example, a tool can analyze this data and help the performance 
analyst identify the causes of pipeline stalls.
> 
> 
>>It might be nice if there was some way to name the relay streams so
>>that they aren't intermingled.  Maybe something analogous to the
>>stream parameter to fprintf.
> 
> 
> Something similar was mentioned as desirable in the OLS2007 talk by
> Bligh / Desnoyer on google's ktrace & lttng.  There, the context was
> an occasional need to have separate buffers for high-volume and
> low-volume messages, so that buffer overflows did not penalize the
> smaller messages too much.  Let's think about this some more.
Certainly this could be a benefit, although not a necessity for a first 
pass implementation.
> 
> 
> 
>>The SystemTap translator would generate calls to target dependent
>>code to implement single instruction or branch trapping.  This is
>>done a variety of ways on different architectures, but generally
>>involves setting a bit in a system register to enable single
>>instruction/branch trapping.
> 
> 
> Is this sort of thing done/doable in kernel space also, or just on
> user-space threads?  
The existing tool is capable of single-step tracing the kernel, with 
some exceptions.
> Is there an existing kernel API for management of
> these registers/fields?
Unfortunately, not that I'm aware of.
> 
> 
> 
>>[...] - instruction tracing enabled for a parent process id will
>>enable tracing for all of its children (threads). [...]
> 
> 
> This is a sensible behavior, though so is a per-thread alternative.
> Since the tracing flags are per-thread control registers anyway,
> I suspect we'll have to build the former on top of the latter.
> 
> 
>>[...] INITIALIZATION/CLEANUP
>>Initialization/cleanup of the instruction tracing feature could be
>>done by insertion of a call to an itrace initialilzation/cleanup
>>routine in the user's begin/end probes.
>>
>>probe begin
>>        itrace_init(<some params>)
>>probe end
>>        itrace_cleanup()
> 
> 
> Neither of these should be necessary.  The existence of
> instruction-trace type probes should imply automated setup/cleanup.
> 
> - FChE

Thanks very much for your comments.

-Maynard


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-06 21:43   ` Maynard Johnson
@ 2007-07-07  1:58     ` Frank Ch. Eigler
  2007-07-10 15:47       ` Maynard Johnson
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-07-07  1:58 UTC (permalink / raw)
  To: maynardj; +Cc: dcnltc, systemtap

Maynard Johnson <maynardj@us.ibm.com> writes:

> [...]
> I work with Dave, and I told him I would cover for him on this issue
> while he's away from the office for a bit.  

Thanks!

> > [...] We could have a
> > guard expression like dtrace's /.../ - though we would probably just
> > spell it thusly:
> >     probe PROBEPOINT if (expr) { } [...]
> Yes, I think this construct could be very useful.

OK, anyone dissenting?

> > [...]
> In fact, the raw trace data is well-defined by the existing ITrace
> tool I mentioned above.  Of course, this definition is
> negotiable. [...]

My point is more whether this definition ought to be hidden, and be
hard-coded for interoperation with your itrace post-processing tool.


> > Is there an existing kernel API for management of these
> > [single-stepping] registers/fields?
> Unfortunately, not that I'm aware of. [...]

OK, then how is it done at all?  You just set saved control register
bits by hand?  Available on how many architectures?  How is the
callback received?  How does this handle issues such as more than one
itrace session operating at a time?

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-05 19:37 ` Frank Ch. Eigler
  2007-07-06 12:46   ` grundy
  2007-07-06 21:43   ` Maynard Johnson
@ 2007-07-10 14:12   ` Dave Nomura
  2007-07-10 14:39     ` Frank Ch. Eigler
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Nomura @ 2007-07-10 14:12 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap, Maynard Johnson, James Keniston

Frank Ch. Eigler wrote:
> Dave Nomura <dcnltc@us.ibm.com> writes:
>   
>> SINGLE_STEP/BRANCH TRAP HANDLER
>> [...]
>> probe branch_step label("branch handler 1")
>> {
>>         <do whatever you want for each branch instruction>
>>         itrace_output();        // write itrace binary output
>> }
>>
>> where "label" is an language extension to attach a name [...]
>>     
>
> Particularly, to turn the probe on and off by explicit function calls.
> This is an area we discussed at the face-to-face meeting in Ottawa
> last week, in relation to user-space probes.  The same concept could
> apply to other probe types.
>
> Regarding semantics, this is tricky business.  Turning off active
> probes is relatively simple, because even if the underlying probe API
> doesn't support instantaneous (atomic) disarming, we can simulate it
> until the API catches up (by adding an "am I supposed to be disarmed?"
> conditional to the handler).  Turning them *on* is different - we
> can't help but possibly miss a couple of events as the API catches up.
>
> Maybe this is acceptable, maybe not.  Some syntax may help tell us the
> judgement of the script programmer.
>   
The single_step and branch_step syntax identifies the handler code and 
needs to be set up before the user probe code that turns on//off 
instruction tracing.  I am assuming that since you have access to the 
pid() function in your *.stp script, that the user program is invoked 
after all of the probes have been processed, so I'm not sure I 
understand how the actual instruction tracing events would get lost.  I 
suppose it could be a big deal if you were trying to trace some  very 
sensitive code it might be important that ALL instructions in the 
specified range are traced, although on PPC there are some reervation(?) 
instructions that cannot be traced using the single instruction trap.  
The strategy used by Performance Inspector's ITRACE is to turn off 
tracing for some number of instructions and try to re-enable tracing by 
setting up a kprobe at specific places in the kernel like switch_to, and 
return from interrupt, etc.
>
> Regarding syntax, we have more options than an opaque string and
> explicit function calls to turn things on and off.  We could have a
> guard expression like dtrace's /.../ - though we would probably just
> spell it thusly:
>
>     probe PROBEPOINT if (expr) { }
>
> where expr could be something as simple as (probe_1_enabled_p), which
> better be a global variable.
>
> The compiler would analyze expr for dataflow, arrange to evaluate this
> condition whenever appropriate (after another probe writes any of its
> inputs), and arrange to promptly activate or deactivate the
> appropriate probes.  Since "promptly" may take some time, script
> programmers plopping a conditional like this in are implying consent
> to a few events being missed.
>
>
>   
>> The itrace_output() is a function that produces the raw trace data
>> that could then be post processed for consumption by various
>> performance analysis tools but the user could do something as simple
>> as printing out the PC value.
>>     
>
> Is the "raw trace data" a well-defined thing?  Why would this sort of
> hard-coded data set be desirable, as opposed to letting the programmer
> write something explicit like:
>    printf("%2b%8b%4b", cpu(), get_ticks(), pc())
> (Of course this can be hidden in a function of his own, or in an
> inspectable tapset.)
>   
The user could do the simple printf that you suggest.  The proposed 
callout to itrace_output() would only by used if you wanted more 
detailed information (like timestamp) as required by a tool like 
qtrace(a sophisticated pipeline analysis tool). Since the instruction 
tracing will trace into the kernel you need some indication of when this 
switch happens, and things like switches to different threads perhaps.  
Since PI has other tools than ITRACE(tprof for example) I'm not sure 
whether the complexity of the raw data that it generates is strictly 
needed by qtrace   I'll have to ask the PI pros.  We would design 
itrace_output() to generate the raw information needed by analysis tools 
like qtrace and let a post processing tool do the formatting for 
consumption the analysis tools
>   
>> It might be nice if there was some way to name the relay streams so
>> that they aren't intermingled.  Maybe something analogous to the
>> stream parameter to fprintf.
>>     
>
> Something similar was mentioned as desirable in the OLS2007 talk by
> Bligh / Desnoyer on google's ktrace & lttng.  There, the context was
> an occasional need to have separate buffers for high-volume and
> low-volume messages, so that buffer overflows did not penalize the
> smaller messages too much.  Let's think about this some more.
>
>
>   
>> The SystemTap translator would generate calls to target dependent
>> code to implement single instruction or branch trapping.  This is
>> done a variety of ways on different architectures, but generally
>> involves setting a bit in a system register to enable single
>> instruction/branch trapping.
>>     
>
> Is this sort of thing done/doable in kernel space also, or just on
> user-space threads?  Is there an existing kernel API for management of
> these registers/fields?
>   
Instruction tracing in the kernel is not something that PI ITRACE 
supports, but I don't know of any reason why we would have that 
restriction.  Maybe the single_step/branch_step would have some sort of 
syntax to allow trap handler code for kernel routines.  There is 
basically one single instruction trap handler that the stap translator 
will generate with logic to figure out what handler code to run, so I 
can't think of any reason why we wouldn't allow this.

One issue that Jim Keniston identified is that we would want some way to 
not trace any of the nstructions in the kernel code associated with stap 
probes, trap handler, etc.  Your thoughts on how we might do this are 
welcome!
>
>   
>> [...] - instruction tracing enabled for a parent process id will
>> enable tracing for all of its children (threads). [...]
>>     
>
> This is a sensible behavior, though so is a per-thread alternative.
> Since the tracing flags are per-thread control registers anyway,
> I suspect we'll have to build the former on top of the latter.
>   
Yes.
>   
>> [...] INITIALIZATION/CLEANUP
>> Initialization/cleanup of the instruction tracing feature could be
>> done by insertion of a call to an itrace initialilzation/cleanup
>> routine in the user's begin/end probes.
>>
>> probe begin
>>         itrace_init(<some params>)
>> probe end
>>         itrace_cleanup()
>>     
>
> Neither of these should be necessary.  The existence of
> instruction-trace type probes should imply automated setup/cleanup.
>   
OK.
> - FChE
>   


-- 
Dave Nomura
LTC Linux Power Toolchain


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-10 14:12   ` Dave Nomura
@ 2007-07-10 14:39     ` Frank Ch. Eigler
  2007-07-10 20:57       ` Maynard Johnson
  2007-08-20  0:34       ` Dave Nomura
  0 siblings, 2 replies; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-07-10 14:39 UTC (permalink / raw)
  To: Dave Nomura; +Cc: systemtap, Maynard Johnson, James Keniston

Dave Nomura <dcnltc@us.ibm.com> writes:

> [...]  The single_step and branch_step syntax identifies the handler
> code and needs to be set up before the user probe code that turns
> on//off instruction tracing.

That goes without saying.  Whatever setup need be done must be done before
explicit on/off operations become available to script code.

> I am assuming that since you have access to the pid() function in
> your *.stp script, that the user program is invoked after all of the
> probes have been processed, so I'm not sure I understand how the
> actual instruction tracing events would get lost.  

Consider the scenario where tracing cannot be toggled on and off
atomically, and yet the probe handle requesting this toggling prefers
not to block.  So we may have an enqueued "turn tracing back on"
request that may take a little while to satisfy.  During this time, we
may lose events.

True, this might not apply to uprobes if we choose to take advantage
of its handlers' ability to sleep.  But it would apply to other types
of probes.

> > Is the "raw trace data" a well-defined thing?  Why would this sort of
> > hard-coded data set be desirable, as opposed to letting the programmer
> > write something explicit like:
> >    printf("%2b%8b%4b", cpu(), get_ticks(), pc())
> > (Of course this can be hidden in a function of his own, or in an
> > inspectable tapset.)
> >
> The user could do the simple printf that you suggest.  The proposed
> callout to itrace_output() would only by used if you wanted more
> detailed information (like timestamp) as required by a tool like
> qtrace [...]

It sounds like what we may need is a collection of functions that
print tracing data in formats compatible with specific post-processing
tools.  It's not "raw trace data" in the itrace sense - it is trace
data in the *qtrace* sense - and that of other tools.

> > Is this sort of thing done/doable in kernel space also, or just on
> > user-space threads?  Is there an existing kernel API for management of
> > these registers/fields?

> [...] There is basically one single instruction trap handler that
> the stap translator will generate with logic to figure out what
> handler code to run [...]

The "existing kernel API" is the key issue here.  How exactly does one
activate single-stepping traps on each of the interesting
architectures, and on multiple different kernel generations (RHEL)?
How does one hook into the handling system correctly (avoiding
interference to other consumers of trap data like gdb, uprobes)?

These questions need answers in order for systemtap to generate code
to implement this.

> One issue that Jim Keniston identified is that we would want some
> way to not trace any of the nstructions in the kernel code
> associated with stap probes, trap handler, etc.  [...]

It's a complicated area with potential to easily bring a system down.
Let's stay away from it until we get some more experience with
user-space.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-07  1:58     ` Frank Ch. Eigler
@ 2007-07-10 15:47       ` Maynard Johnson
  0 siblings, 0 replies; 27+ messages in thread
From: Maynard Johnson @ 2007-07-10 15:47 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: dcnltc, systemtap

Frank Ch. Eigler wrote:
> Maynard Johnson <maynardj@us.ibm.com> writes:
> 
> 
>>[...]
>>I work with Dave, and I told him I would cover for him on this issue
>>while he's away from the office for a bit.  
> 
> 
> Thanks!
> 
> 
>>>[...] We could have a
>>>guard expression like dtrace's /.../ - though we would probably just
>>>spell it thusly:
>>>    probe PROBEPOINT if (expr) { } [...]
>>
>>Yes, I think this construct could be very useful.
> 
> 
> OK, anyone dissenting?
> 
> 
>>>[...]
>>
>>In fact, the raw trace data is well-defined by the existing ITrace
>>tool I mentioned above.  Of course, this definition is
>>negotiable. [...]
> 
> 
> My point is more whether this definition ought to be hidden, and be
> hard-coded for interoperation with your itrace post-processing tool.
As in "instruction trace tool", I envision a small number of pre-defined 
data formats that the user could select from that would provide, say, 2 
or 3 different levels of detail in the trace.  One of these formats 
would likely be a superset of the others in order to provide complete 
detail.  Examples:  1) branch-level trace data format; 2) 
instruction-level trace data format (basically, a branch-level trace 
plus load/store addresses, etc.)

These pre-defined formats should be public to facilitate development of 
post-processing tools.  Example 1: An existing pipeline analyzer tool 
uses an instruction-level trace to help performance analysts identify 
the causes of stalls.  An intermediate post-processing tool reads the 
raw trace data, fills in the gaps from the sparse trace data, and 
converts to a particular binary format (qtrace) that is then consumed by 
the pipeline analyzer tool.  Example 2: A higher level analysis tool 
could use some subset of the trace data represented in XML, which could 
be provided by an intermediate post-processing tool.

If feasible and practical, we could allow the user to specify an 
arbitrary data format.  This could be a future enhancement.
> 
> 
> 
>>>Is there an existing kernel API for management of these
>>>[single-stepping] registers/fields?
>>
>>Unfortunately, not that I'm aware of. [...]
> 
> 
> OK, then how is it done at all?  You just set saved control register
> bits by hand?  Available on how many architectures?  How is the
> callback received?  How does this handle issues such as more than one
> itrace session operating at a time?
Currently, the Performance Inspector ITrace supports i386 (Intel/AMD), 
x86_64, and PPC64.  The tracing enablement is, of course, unique per 
architecture.  For example, on PPC64, the MSR_SE or MSR_BE bit is set to 
enable single step exception or branch exception, respectively.  The 
global variable '__debugger_sstep' (declared in 
include/asm-powerpc/system.h) is set to the appropriate callback 
function.  We also use kprobes to set up handlers for things like task 
switch and process exit.  I'm not as familiar with the details for x86*, 
but I do know the main exception handler for ITrace is set by changing 
entry #1 in the interrupt vector table.

As for handling the case of more than one ITrace session being active at 
once -- no, there's nothing that prevents that from happening.  This is 
certainly an issue that should be addressed.

Regards,
-Maynard
> 
> - FChE


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-10 14:39     ` Frank Ch. Eigler
@ 2007-07-10 20:57       ` Maynard Johnson
  2007-07-10 22:45         ` Jim Keniston
  2007-07-11  4:31         ` Ananth N Mavinakayanahalli
  2007-08-20  0:34       ` Dave Nomura
  1 sibling, 2 replies; 27+ messages in thread
From: Maynard Johnson @ 2007-07-10 20:57 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Dave Nomura, systemtap, Maynard Johnson, James Keniston

Frank Ch. Eigler wrote:

> Dave Nomura <dcnltc@us.ibm.com> writes:
> 
> 
[snip]
> 
>>>Is this sort of thing done/doable in kernel space also, or just on
>>>user-space threads?  Is there an existing kernel API for management of
>>>these registers/fields?
> 
> 
>>[...] There is basically one single instruction trap handler that
>>the stap translator will generate with logic to figure out what
>>handler code to run [...]
> 
> 
> The "existing kernel API" is the key issue here.  How exactly does one
> activate single-stepping traps on each of the interesting
> architectures, and on multiple different kernel generations (RHEL)?
> How does one hook into the handling system correctly (avoiding
> interference to other consumers of trap data like gdb, uprobes)?
I responded to some of these questions in my previous reply.  As for 
avoiding interference with other consumers of trap data, this is not 
handled in the current implementation and is TBD for the SystemTap 
proposal.  We'll investigate how other tools are handling this.  As for 
kernel debuggers, that's a pretty murky picture.  AFAIK, there isn't a 
common kernel debugger that's in mainline.  PPC64 has xmon -- don't know 
about other arch's.  We'll take a look at uprobes source to see how it 
handles this issue, but I'm not sure where to get source from.  Is it at 
http://sourceware.org/systemtap/kprobes/index.html?
> 
> These questions need answers in order for systemtap to generate code
> to implement this.
> 
> 
> 
>>One issue that Jim Keniston identified is that we would want some
>>way to not trace any of the nstructions in the kernel code
>>associated with stap probes, trap handler, etc.  [...]
> 
> 
> It's a complicated area with potential to easily bring a system down.
So true.  ;-)
> Let's stay away from it until we get some more experience with
> user-space.
I think that's a good idea.  I don't see a problem in phasing in the 
kernel tracing functionality in a later version.

Regards,
-Maynard
> 
> 
> - FChE


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-10 20:57       ` Maynard Johnson
@ 2007-07-10 22:45         ` Jim Keniston
  2007-07-11  4:31         ` Ananth N Mavinakayanahalli
  1 sibling, 0 replies; 27+ messages in thread
From: Jim Keniston @ 2007-07-10 22:45 UTC (permalink / raw)
  To: maynardj
  Cc: Frank Ch. Eigler, Dave Nomura, systemtap, Maynard Johnson,
	James Keniston

On Tue, 2007-07-10 at 15:57 -0500, Maynard Johnson wrote:
> ... We'll take a look at uprobes source to see how it 
> handles this issue, but I'm not sure where to get source from.  Is it at 
> http://sourceware.org/systemtap/kprobes/index.html?
> > 

Nope.  The most recently published uprobes source (a 3-patch set) is
still at http://sources.redhat.com/ml/systemtap/2007-q2/msg00399.html.

A more recent version, which allows uprobes to be built as a module, has
been kicking around IBM for a couple of weeks; but the modularized
uprobes requires an export that's not provided by certain kernels of
interest (including Linus's).

Jim

...
> 
> Regards,
> -Maynard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-10 20:57       ` Maynard Johnson
  2007-07-10 22:45         ` Jim Keniston
@ 2007-07-11  4:31         ` Ananth N Mavinakayanahalli
  1 sibling, 0 replies; 27+ messages in thread
From: Ananth N Mavinakayanahalli @ 2007-07-11  4:31 UTC (permalink / raw)
  To: Maynard Johnson
  Cc: Frank Ch. Eigler, Dave Nomura, systemtap, Maynard Johnson,
	James Keniston

On Tue, Jul 10, 2007 at 03:57:39PM -0500, Maynard Johnson wrote:
> Frank Ch. Eigler wrote:
> 
> >Dave Nomura <dcnltc@us.ibm.com> writes:
> >
> >
> [snip]
> >
> >>>Is this sort of thing done/doable in kernel space also, or just on
> >>>user-space threads?  Is there an existing kernel API for management of
> >>>these registers/fields?
> >
> >
> >>[...] There is basically one single instruction trap handler that
> >>the stap translator will generate with logic to figure out what
> >>handler code to run [...]
> >
> >
> >The "existing kernel API" is the key issue here.  How exactly does one
> >activate single-stepping traps on each of the interesting
> >architectures, and on multiple different kernel generations (RHEL)?
> >How does one hook into the handling system correctly (avoiding
> >interference to other consumers of trap data like gdb, uprobes)?
> I responded to some of these questions in my previous reply.  As for 
> avoiding interference with other consumers of trap data, this is not 
> handled in the current implementation and is TBD for the SystemTap 
> proposal.  We'll investigate how other tools are handling this.  As for 
> kernel debuggers, that's a pretty murky picture.  AFAIK, there isn't a 
> common kernel debugger that's in mainline.  PPC64 has xmon -- don't know 
> about other arch's.  We'll take a look at uprobes source to see how it 
> handles this issue, but I'm not sure where to get source from.  Is it at 
> http://sourceware.org/systemtap/kprobes/index.html?

Kprobes and xmon on powerpc use the same trap variant instruction. A
scheme is in place to pass on traps that aren't caused due to kprobes.
This is done using the notifier infrastructure in kernel... see
kprobes_exceptions_notify(). If kprobes gets notified of a trap caused
due to an xmon breakpoint, it'll just retury NOTIFY_DONE indicating that
it didn't find an entry corresponding to the trap address in its table
and hence is meant for some other consumer - xmon or any other
interested tool. The notifier mechanism then invokes the next interested
party in the chain (if one is registered).

Ananth

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-07-10 14:39     ` Frank Ch. Eigler
  2007-07-10 20:57       ` Maynard Johnson
@ 2007-08-20  0:34       ` Dave Nomura
  2007-08-20  0:37         ` Roland McGrath
  2007-08-23 22:10         ` proposed instruction trace support in SystemTap Dave Nomura
  1 sibling, 2 replies; 27+ messages in thread
From: Dave Nomura @ 2007-08-20  0:34 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap, Maynard Johnson, James Keniston

I've been looking into the kernel API for handling single stepping and 
haven't really found anything.  ptrace() is used by gdb but it's usage 
model might be overly restrictive for what we want:  we would have to 
have a parent process then uses ptrace() to trace it's children.  
ptrace() also does not trace into the kernel which is an ITRACE requirement.

I think the requirement of tracing into the kernel is only needed in 
some scenarios and may only be needed for the ITRACE application of 
SystemTap instruction tracing.  In its most general form Perfomance 
Inspector ITRACE allows tracing into the kernel and tracing of a whole 
range of processes.  Due to these requirements it places usage 
constraints on the user that requires exclusive access ot a machine.  In 
the more common SystemTap instruction tracing scenario only a single 
process is being traced, and only user code is traced maybe it would be 
approriate to use ptrace() to do the single stepping.

It has already been suggested that we have different APIs for ITRACE vs. 
simpler (non-kernel tracing) instruction tracing modes so it might be a 
simple matter of telling  the SystemTap translator what kind of trap 
handler to generate (or referencd from the runtime stap scripts).  In 
the non-kernel-tracing-single-process scenario just the normal process 
switch management of registers will handle restoring the single step 
trap bit, or if ptrace() (or possibly utrace()).

The ITRACE-kernel-tracing scenario might require similar usage 
restrictions as PI ITRACE, and we simply would require that you aren't 
using other kernel debuggers (xmon, kgdb,...) while trying to do an 
ITRACE to avoid conflict over the kernel resources needed for 
instruction tracing.   Alternatively, a kernel API (if it doesn't 
already exist) for handling these kernel resources could be created.  I 
have heard that xmon and kgdb both use the __debugger_sstep() trap 
handler pointer.
Frank Ch. Eigler wrote:
>
>> [...] There is basically one single instruction trap handler that
>> the stap translator will generate with logic to figure out what
>> handler code to run [...]
>>     
>
> The "existing kernel API" is the key issue here.  How exactly does one
> activate single-stepping traps on each of the interesting
> architectures, and on multiple different kernel generations (RHEL)?
> How does one hook into the handling system correctly (avoiding
> interference to other consumers of trap data like gdb, uprobes)?
>
> These questions need answers in order for systemtap to generate code
> to implement this.
>
>
>   
> - FChE
>
>   

-- 
Dave Nomura
LTC Linux Power Toolchain

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-08-20  0:34       ` Dave Nomura
@ 2007-08-20  0:37         ` Roland McGrath
  2007-08-25 11:34           ` Dave Nomura
                             ` (2 more replies)
  2007-08-23 22:10         ` proposed instruction trace support in SystemTap Dave Nomura
  1 sibling, 3 replies; 27+ messages in thread
From: Roland McGrath @ 2007-08-20  0:37 UTC (permalink / raw)
  To: dcnltc; +Cc: Frank Ch. Eigler, systemtap, Maynard Johnson, James Keniston

For user-mode stepping (all you can do via ptrace), this is what the utrace
in-kernel APIs give you.  The in-kernel case has enough different issues
that I think it's appropriate to consider it an entirely separate case.
For that, kprobes already has its fingers in this area of machine-specific
code.  It might make most sense for in-kernel stepping to be an extension
of the kprobes code.  OTOH, with the hw_breakpoint (nee kwatch) work by
Alan Stern <stern@rowland.harvard.edu> we have a second in-kernel case that
(on some machines) wants to get involved with single-stepping.  Perhaps it
makes sense to consolidate the efforts on some shared low-level part that
deals with the stepping part.  Or there may not be enough to be done there
that anything beyond current machine-specific calls and trap notifiers are
really required.  (Off hand I think at least some kind of coordination will
be required to avoid these three things stepping on each other's toes.)

Thanks,
Roland

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-08-20  0:34       ` Dave Nomura
  2007-08-20  0:37         ` Roland McGrath
@ 2007-08-23 22:10         ` Dave Nomura
  1 sibling, 0 replies; 27+ messages in thread
From: Dave Nomura @ 2007-08-23 22:10 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Maynard Johnson, systemtap, Maynard Johnson, James Keniston

 From a conversation with Paul Mackerras I recently learned that  
__debugger_sstep() is the PPC trap handler pointer used only for kernel 
debugging, and its use is based on only one kernel debugger active at a 
time.  Although I think a kernel API could be created to coordinate use 
of this function pointer, I don't think one exists.  It would probably 
be prudent to put restrictions on instruction tracing of kernel code in 
the SystemTap-itrace feature.

Although ptrace() probably is not suitable for instruction tracing of 
user code, if for no other reason that performance issues, I have been 
looking into some of the documentation of utrace, and am wondering if it 
might be suitable.

I
Dave Nomura wrote:
> I've been looking into the kernel API for handling single stepping and 
> haven't really found anything.  ptrace() is used by gdb but it's usage 
> model might be overly restrictive for what we want:  we would have to 
> have a parent process then uses ptrace() to trace it's children.  
> ptrace() also does not trace into the kernel which is an ITRACE 
> requirement.
>
> I think the requirement of tracing into the kernel is only needed in 
> some scenarios and may only be needed for the ITRACE application of 
> SystemTap instruction tracing.  In its most general form Perfomance 
> Inspector ITRACE allows tracing into the kernel and tracing of a whole 
> range of processes.  Due to these requirements it places usage 
> constraints on the user that requires exclusive access ot a machine.  
> In the more common SystemTap instruction tracing scenario only a 
> single process is being traced, and only user code is traced maybe it 
> would be approriate to use ptrace() to do the single stepping.
>
> It has already been suggested that we have different APIs for ITRACE 
> vs. simpler (non-kernel tracing) instruction tracing modes so it might 
> be a simple matter of telling  the SystemTap translator what kind of 
> trap handler to generate (or referencd from the runtime stap 
> scripts).  In the non-kernel-tracing-single-process scenario just the 
> normal process switch management of registers will handle restoring 
> the single step trap bit, or if ptrace() (or possibly utrace()).
>
> The ITRACE-kernel-tracing scenario might require similar usage 
> restrictions as PI ITRACE, and we simply would require that you aren't 
> using other kernel debuggers (xmon, kgdb,...) while trying to do an 
> ITRACE to avoid conflict over the kernel resources needed for 
> instruction tracing.   Alternatively, a kernel API (if it doesn't 
> already exist) for handling these kernel resources could be created.  
> I have heard that xmon and kgdb both use the __debugger_sstep() trap 
> handler pointer.
> Frank Ch. Eigler wrote:
>>
>>> [...] There is basically one single instruction trap handler that
>>> the stap translator will generate with logic to figure out what
>>> handler code to run [...]
>>>     
>>
>> The "existing kernel API" is the key issue here.  How exactly does one
>> activate single-stepping traps on each of the interesting
>> architectures, and on multiple different kernel generations (RHEL)?
>> How does one hook into the handling system correctly (avoiding
>> interference to other consumers of trap data like gdb, uprobes)?
>>
>> These questions need answers in order for systemtap to generate code
>> to implement this.
>>
>>
>>   - FChE
>>
>>   
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-08-20  0:37         ` Roland McGrath
@ 2007-08-25 11:34           ` Dave Nomura
  2007-08-29 14:57             ` Frank Ch. Eigler
  2007-08-29 15:40           ` proposed instruction trace support in SystemTap Dave Nomura
  2007-09-06  2:57           ` using utrace for instruction tracing Dave Nomura
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Nomura @ 2007-08-25 11:34 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Frank Ch. Eigler, systemtap, Maynard Johnson, James Keniston

It appears that ptrace() wouldn't be able to deliver acceptable 
performance for instruction tracing.  Do you think that utrace would be 
a reasonable alternative?
Roland McGrath wrote:
> For user-mode stepping (all you can do via ptrace), this is what the utrace
> in-kernel APIs give you.  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-08-25 11:34           ` Dave Nomura
@ 2007-08-29 14:57             ` Frank Ch. Eigler
  2007-08-30  5:43               ` kernel API for in-kernel single stepping Dave Nomura
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-08-29 14:57 UTC (permalink / raw)
  To: Dave Nomura; +Cc: Roland McGrath, systemtap, Maynard Johnson, James Keniston

Dave Nomura <dcnltc@us.ibm.com> writes:

> It appears that ptrace() wouldn't be able to deliver acceptable
> performance for instruction tracing.  Do you think that utrace would
> be a reasonable alternative?

That's what Roland is saying.  The next question is "who shall
prototype a single-stepping-on-top-of-utrace kernel module?".  With
such a prototype in hand, plopping support into systemtap is a small
step beyond.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-08-20  0:37         ` Roland McGrath
  2007-08-25 11:34           ` Dave Nomura
@ 2007-08-29 15:40           ` Dave Nomura
  2007-08-29 16:25             ` Frank Ch. Eigler
  2007-09-06  2:57           ` using utrace for instruction tracing Dave Nomura
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Nomura @ 2007-08-29 15:40 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frank Ch. Eigler, systemtap, Maynard Johnson

Would the performance of utrace be acceptable for doing instruction 
tracing?  It sounds like a lot of the signal overhead of ptrace is 
eliminated.
Would utrace only be useful for user-mode stepping, or could it somehow 
handle the in-kernel case?
Roland McGrath wrote:
> For user-mode stepping (all you can do via ptrace), this is what the utrace
> in-kernel APIs give you.  The in-kernel case has enough different issues
> that I think it's appropriate to consider it an entirely separate case.
> For that, kprobes already has its fingers in this area of machine-specific
> code.  It might make most sense for in-kernel stepping to be an extension
> of the kprobes code.  OTOH, with the hw_breakpoint (nee kwatch) work by
> Alan Stern <stern@rowland.harvard.edu> we have a second in-kernel case that
> (on some machines) wants to get involved with single-stepping.  Perhaps it
> makes sense to consolidate the efforts on some shared low-level part that
> deals with the stepping part. 
>  Or there may not be enough to be done there
> that anything beyond current machine-specific calls and trap notifiers are
> really required.  (Off hand I think at least some kind of coordination will
> be required to avoid these three things stepping on each other's toes.)
>
>
> Thanks,
> Roland
>
>   


-- 
Dave Nomura
LTC Linux Power Toolchain


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: proposed instruction trace support in SystemTap
  2007-08-29 15:40           ` proposed instruction trace support in SystemTap Dave Nomura
@ 2007-08-29 16:25             ` Frank Ch. Eigler
  0 siblings, 0 replies; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-08-29 16:25 UTC (permalink / raw)
  To: Dave Nomura; +Cc: systemtap, Maynard Johnson

Hi -

On Wed, Aug 29, 2007 at 08:28:55AM -0700, Dave Nomura wrote:
> Would the performance of utrace be acceptable for doing instruction 
> tracing?  

Probably.

> It sounds like a lot of the signal overhead of ptrace is eliminated.

Yes.

> Would utrace only be useful for user-mode stepping, 

Yes.

> or could it somehow handle the in-kernel case?

No.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* kernel API for in-kernel single stepping
  2007-08-29 14:57             ` Frank Ch. Eigler
@ 2007-08-30  5:43               ` Dave Nomura
  2007-08-30 13:05                 ` Paul Mackerras
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Nomura @ 2007-08-30  5:43 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Frank Ch. Eigler, Roland McGrath, systemtap, Maynard Johnson,
	James Keniston

Paul,
   It looks like you have vast experience with single stepping in the 
kernel (sstep.[hc]).  In a previous mail you mentioned that there is 
only a "scouts honor" convention with use of the kernel's single step 
trapping mechanism that is followed by the kernel debuggers with only 
one client at a time using the resources.   With the various kernel 
debuggers, kprobes, itrace, and maybe others trying to share these 
resources do you think it is time to develop some sort of kernel single 
stepping API?   Frank is requesting this  API  before  changing  
SystemTap to support  single step traps.

I'm not an experienced kernel developer so this is way beyond my 
expertise.  Is this something you would consider tackling?
Roland?  Can you suggest anyone else?

If this is not feasible/advisable then we'd like to know so that we can 
move forward on the instruction tracing support in SystemTap.

Dave

Frank Ch. Eigler wrote:
> Dave Nomura <dcnltc@us.ibm.com> writes:
>
>   
>> It appears that ptrace() wouldn't be able to deliver acceptable
>> performance for instruction tracing.  Do you think that utrace would
>> be a reasonable alternative?
>>     
>
> That's what Roland is saying.  The next question is "who shall
> prototype a single-stepping-on-top-of-utrace kernel module?".  With
> such a prototype in hand, plopping support into systemtap is a small
> step beyond.
>
> - FChE

Roland McGrath (roland@redhat.com) writes:
> For user-mode stepping (all you can do via ptrace), this is what the utrace
> in-kernel APIs give you.  The in-kernel case has enough different issues
> that I think it's appropriate to consider it an entirely separate case.
> For that, kprobes already has its fingers in this area of machine-specific
> code.  It might make most sense for in-kernel stepping to be an extension
> of the kprobes code.  OTOH, with the hw_breakpoint (nee kwatch) work by
> Alan Stern <stern@rowland.harvard.edu> we have a second in-kernel case that
> (on some machines) wants to get involved with single-stepping.  Perhaps it
> makes sense to consolidate the efforts on some shared low-level part that
> deals with the stepping part.  Or there may not be enough to be done there
> that anything beyond current machine-specific calls and trap notifiers are
> really required.  (Off hand I think at least some kind of coordination will
> be required to avoid these three things stepping on each other's toes.)
>
>
> Thanks,
> Roland

-- 
Dave Nomura
LTC Linux Power Toolchain

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel API for in-kernel single stepping
  2007-08-30  5:43               ` kernel API for in-kernel single stepping Dave Nomura
@ 2007-08-30 13:05                 ` Paul Mackerras
  2007-09-04  3:05                   ` Frank Ch. Eigler
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2007-08-30 13:05 UTC (permalink / raw)
  To: dcnltc
  Cc: Frank Ch. Eigler, Roland McGrath, systemtap, Maynard Johnson,
	James Keniston

Dave Nomura writes:

>    It looks like you have vast experience with single stepping in the 
> kernel (sstep.[hc]).  In a previous mail you mentioned that there is 
> only a "scouts honor" convention with use of the kernel's single step 
> trapping mechanism that is followed by the kernel debuggers with only 
> one client at a time using the resources.   With the various kernel 
> debuggers, kprobes, itrace, and maybe others trying to share these 
> resources do you think it is time to develop some sort of kernel single 
> stepping API?   Frank is requesting this  API  before  changing  
> SystemTap to support  single step traps.

At present kprobes sets the MSR_SE bit in the MSR when it wants to
single-step, and uses the notify_die infrastructure as the way it gets
notified when the single-step trap occurs.  It would be reasonable for
other things that want to use single-stepping to do the same, i.e. set
MSR_SE when they want to single-step, and use the notify_die stuff to
get control back when the single-step trap occurs.

In other words the API for single-stepping is just the notify_die
stuff.  We don't get very formal or elaborate about APIs in the
kernel, preferring to extend a simple API when a need is shown rather
than designing an elaborate API up front that attempts to cater for
all possible needs.

And yes this is saying that itrace should use notify_die.  The
debugger_sstep hook is really only for in-kernel debuggers.  If
notify_die is not sufficient for itrace, let me know and we'll work
something else out.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel API for in-kernel single stepping
  2007-08-30 13:05                 ` Paul Mackerras
@ 2007-09-04  3:05                   ` Frank Ch. Eigler
  2007-09-05  5:02                     ` Dave Nomura
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Ch. Eigler @ 2007-09-04  3:05 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: dcnltc, Roland McGrath, systemtap, Maynard Johnson, James Keniston

Hi, Paul -


> [...]
> > [...]  With the various kernel debuggers, kprobes, itrace, and
> > maybe others trying to share these
> > [MSR/single-stepping/notify_die] resources do you think it is time
> > to develop some sort of kernel single stepping API?  Frank is
> > requesting this API before changing SystemTap to support single
> > step traps.

Actually, I have asked only about user-space single-stepping, possibly
based on the utrace API.  Broad kernel-space single-stepping is risky
enough not to attempt yet.

> At present kprobes sets the MSR_SE bit in the MSR when it wants to
> single-step, and uses the notify_die infrastructure as the way it
> gets notified when the single-step trap occurs.  [...]  In other
> words the API for single-stepping is just the notify_die stuff.
> [...]

We can attempt this later.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel API for in-kernel single stepping
  2007-09-04  3:05                   ` Frank Ch. Eigler
@ 2007-09-05  5:02                     ` Dave Nomura
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Nomura @ 2007-09-05  5:02 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Paul Mackerras, Roland McGrath, systemtap, Maynard Johnson,
	James Keniston

I'd like to summarize where we stand on instruction tracing support in 
SystemTap.

The key blocking point on the original proposal was the lack of some 
kernel API to support single-step trapping, the basis of instruction 
tracing.

Roland suggested that utrace would be a reasonable kernel API to use for 
single-step tracing of user code, and that kernel instruction tracing be 
treated as a separate case.  Paul felt that the 
MSR/single-step/notify_due infrastructure that kprobes uses should be an 
adequate API for itrace to use, and if not, maybe something could be 
done to make it so.  Frank suggested that we defer that capability until 
we get the user code tracing sorted out.

Thanks to all of you for your input!

I will try and add a utrace based single-stepping mechanism to my 
SystemTap instruction trace prototype and see how the performance 
compares with the Performance Inspector ITRACE strategy (which uses 
__debugger_sstep() as a trap handler without any other layers of overhead).
Frank Ch. Eigler wrote:
> Hi, Paul -
>
>
>   
>> [...]
>>     
>>> [...]  With the various kernel debuggers, kprobes, itrace, and
>>> maybe others trying to share these
>>> [MSR/single-stepping/notify_die] resources do you think it is time
>>> to develop some sort of kernel single stepping API?  Frank is
>>> requesting this API before changing SystemTap to support single
>>> step traps.
>>>       
>
> Actually, I have asked only about user-space single-stepping, possibly
> based on the utrace API.  Broad kernel-space single-stepping is risky
> enough not to attempt yet.
>
>   
>> At present kprobes sets the MSR_SE bit in the MSR when it wants to
>> single-step, and uses the notify_die infrastructure as the way it
>> gets notified when the single-step trap occurs.  [...]  In other
>> words the API for single-stepping is just the notify_die stuff.
>> [...]
>>     
>
> We can attempt this later.
>
> - FChE
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* using utrace for instruction tracing
  2007-08-20  0:37         ` Roland McGrath
  2007-08-25 11:34           ` Dave Nomura
  2007-08-29 15:40           ` proposed instruction trace support in SystemTap Dave Nomura
@ 2007-09-06  2:57           ` Dave Nomura
  2007-09-06 14:05             ` Jim Keniston
  2 siblings, 1 reply; 27+ messages in thread
From: Dave Nomura @ 2007-09-06  2:57 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Frank Ch. Eigler, systemtap, Maynard Johnson, James Keniston

I notice that the utrace documentation says that single-stepping is only 
supported if ARCH_HAS_SINGLE_STEP/ARCH_HAS_BLOCK_STEP is supported.  My 
googling found a note you sent the says it is supported on ia64 and ppc, 
but not on x86 yet.  Any idea if there are any work underway to support 
this on x86?

Frank: Is this a unacceptable, or can we live with no user instruction 
tracing on x86 until support is added?  I'm not very familiar with how 
PI ITRACE does single step tracing on x86, for kernel/user code tracing 
but I know it is significantly different than the PPC.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: using utrace for instruction tracing
  2007-09-06  2:57           ` using utrace for instruction tracing Dave Nomura
@ 2007-09-06 14:05             ` Jim Keniston
  2007-09-06 18:28               ` Dave Nomura
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Keniston @ 2007-09-06 14:05 UTC (permalink / raw)
  To: dcnltc; +Cc: systemtap

On Wed, 2007-09-05 at 15:46 -0700, Dave Nomura wrote:
> I notice that the utrace documentation says that single-stepping is only 
> supported if ARCH_HAS_SINGLE_STEP/ARCH_HAS_BLOCK_STEP is supported.  My 
> googling found a note you sent the says it is supported on ia64 and ppc, 
> but not on x86 yet.  Any idea if there are any work underway to support 
> this on x86?
> 
> Frank: Is this a unacceptable, or can we live with no user instruction 
> tracing on x86 until support is added?  I'm not very familiar with how 
> PI ITRACE does single step tracing on x86, for kernel/user code tracing 
> but I know it is significantly different than the PPC.

I'm not sure what you're looking at, but utrace definitely supports
single-stepping on x86 -- and at least ppc64, x86_64, and s390 as well,
since uprobes runs on all those and uses single-stepping.

Jim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: using utrace for instruction tracing
  2007-09-06 14:05             ` Jim Keniston
@ 2007-09-06 18:28               ` Dave Nomura
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Nomura @ 2007-09-06 18:28 UTC (permalink / raw)
  To: Jim Keniston; +Cc: systemtap, Roland McGrath

This is what I found when I googled ARCH_HAS_SINGLE_STEP x86 utrace
http://lkml.org/lkml/2007/2/21/264
Date 	Wed, 21 Feb 2007 13:18:11 -0500
From 	Jeff Dike <>
Subject 	Re: [PATCH] UML utrace support, step 1

......

> > +#define ARCH_HAS_SINGLE_STEP	(1)
> 
> Note you'll eventually want to define the block-step macro and functions
> depending on subarch.  (ia64 supports it, and x86 one day will.)

I guess I didn't read the comment carefully enough.  I think this mail 
is from a guy implementing utrace for x86 and is saying that block 
stepping isn't supported.

Jim Keniston wrote:
> On Wed, 2007-09-05 at 15:46 -0700, Dave Nomura wrote:
>   
>> I notice that the utrace documentation says that single-stepping is only 
>> supported if ARCH_HAS_SINGLE_STEP/ARCH_HAS_BLOCK_STEP is supported.  My 
>> googling found a note you sent the says it is supported on ia64 and ppc, 
>> but not on x86 yet.  Any idea if there are any work underway to support 
>> this on x86?
>>
>> Frank: Is this a unacceptable, or can we live with no user instruction 
>> tracing on x86 until support is added?  I'm not very familiar with how 
>> PI ITRACE does single step tracing on x86, for kernel/user code tracing 
>> but I know it is significantly different than the PPC.
>>     
>
> I'm not sure what you're looking at, but utrace definitely supports
> single-stepping on x86 -- and at least ppc64, x86_64, and s390 as well,
> since uprobes runs on all those and uses single-stepping.
>
> Jim
>
>   


-- 
Dave Nomura
LTC Linux Power Toolchain


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2007-09-06 14:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-02 23:01 proposed instruction trace support in SystemTap Dave Nomura
2007-07-05 19:37 ` Frank Ch. Eigler
2007-07-06 12:46   ` grundy
2007-07-06 14:59     ` Frank Ch. Eigler
2007-07-06 21:43   ` Maynard Johnson
2007-07-07  1:58     ` Frank Ch. Eigler
2007-07-10 15:47       ` Maynard Johnson
2007-07-10 14:12   ` Dave Nomura
2007-07-10 14:39     ` Frank Ch. Eigler
2007-07-10 20:57       ` Maynard Johnson
2007-07-10 22:45         ` Jim Keniston
2007-07-11  4:31         ` Ananth N Mavinakayanahalli
2007-08-20  0:34       ` Dave Nomura
2007-08-20  0:37         ` Roland McGrath
2007-08-25 11:34           ` Dave Nomura
2007-08-29 14:57             ` Frank Ch. Eigler
2007-08-30  5:43               ` kernel API for in-kernel single stepping Dave Nomura
2007-08-30 13:05                 ` Paul Mackerras
2007-09-04  3:05                   ` Frank Ch. Eigler
2007-09-05  5:02                     ` Dave Nomura
2007-08-29 15:40           ` proposed instruction trace support in SystemTap Dave Nomura
2007-08-29 16:25             ` Frank Ch. Eigler
2007-09-06  2:57           ` using utrace for instruction tracing Dave Nomura
2007-09-06 14:05             ` Jim Keniston
2007-09-06 18:28               ` Dave Nomura
2007-08-23 22:10         ` proposed instruction trace support in SystemTap Dave Nomura
2007-07-06 21:39 ` Maynard Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).