public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* patches to actually use markers?
@ 2007-10-29 19:26 David Smith
  2007-10-29 22:05 ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: David Smith @ 2007-10-29 19:26 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Mathieu,

Now that the markers facility itself has made it in the kernel, do you 
have plans on trying to send patches that actually use markers to lkml?

For systemtap's use, we'd like to get some actual markers in the 
upstream kernel.  Off the top of my head, we might start with adding 
markers to system calls (sys_*) that contain the system call's argument(s).

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-10-29 19:26 patches to actually use markers? David Smith
@ 2007-10-29 22:05 ` Mathieu Desnoyers
  2007-10-31 16:29   ` David Smith
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-10-29 22:05 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* David Smith (dsmith@redhat.com) wrote:
> Mathieu,
> 
> Now that the markers facility itself has made it in the kernel, do you 
> have plans on trying to send patches that actually use markers to lkml?
> 
> For systemtap's use, we'd like to get some actual markers in the 
> upstream kernel.  Off the top of my head, we might start with adding 
> markers to system calls (sys_*) that contain the system call's argument(s).
> 

Hi David,

Yes, we have something similar in LTTng, we instrument many widely used
system calls to get the detailed arguments.

Do you want to start having a look at my instrumentation patchset ?
Those are the 
lttng-instrumentation-*.patch patches available in the following
tarball:

http://ltt.polymtl.ca/lttng/patch-2.6.23-mm1-lttng-0.10-pre8.tar.bz2

The patches that you may find interesting to comment are :

lttng-kernel-trace-thread-flag-*
  These patches adds a thread flag for kernel wide syscall trace
  activation.
  Note that I would gladly accept some help with the
    lttng-kernel-trace-thread-flag-ia64.patch
    lttng-kernel-trace-thread-flag-s390.patch
    lttng-instrumentation-s390.patch
  They need to add the 9th bit of thread flag that has to be checked by
  a 8 bit limited instruction on these architectures.

lttng-instrumentation-*
  Actual markers. It also includes assembly code change to use the
  thread flags for syscall_trace.
  Some architectures do not have complete architecture specific marker
  set complete.

It's a good thing that we start having a discussion about these marker
sites at this point.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-10-29 22:05 ` Mathieu Desnoyers
@ 2007-10-31 16:29   ` David Smith
  2007-10-31 16:58     ` Mathieu Desnoyers
  2007-11-16 19:13     ` David Smith
  0 siblings, 2 replies; 20+ messages in thread
From: David Smith @ 2007-10-31 16:29 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Mathieu,

Thanks for the pointers, I'll take a look at your patches.

Mathieu Desnoyers wrote:
> * David Smith (dsmith@redhat.com) wrote:
>> Mathieu,
>>
>> Now that the markers facility itself has made it in the kernel, do you 
>> have plans on trying to send patches that actually use markers to lkml?
>>
>> For systemtap's use, we'd like to get some actual markers in the 
>> upstream kernel.  Off the top of my head, we might start with adding 
>> markers to system calls (sys_*) that contain the system call's argument(s).
>>
> 
> Hi David,
> 
> Yes, we have something similar in LTTng, we instrument many widely used
> system calls to get the detailed arguments.
> 
> Do you want to start having a look at my instrumentation patchset ?
> Those are the 
> lttng-instrumentation-*.patch patches available in the following
> tarball:
> 
> http://ltt.polymtl.ca/lttng/patch-2.6.23-mm1-lttng-0.10-pre8.tar.bz2
> 
> The patches that you may find interesting to comment are :
> 
> lttng-kernel-trace-thread-flag-*
>   These patches adds a thread flag for kernel wide syscall trace
>   activation.
>   Note that I would gladly accept some help with the
>     lttng-kernel-trace-thread-flag-ia64.patch
>     lttng-kernel-trace-thread-flag-s390.patch
>     lttng-instrumentation-s390.patch
>   They need to add the 9th bit of thread flag that has to be checked by
>   a 8 bit limited instruction on these architectures.
> 
> lttng-instrumentation-*
>   Actual markers. It also includes assembly code change to use the
>   thread flags for syscall_trace.
>   Some architectures do not have complete architecture specific marker
>   set complete.
> 
> It's a good thing that we start having a discussion about these marker
> sites at this point.
> 
> Mathieu
> 


-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-10-31 16:29   ` David Smith
@ 2007-10-31 16:58     ` Mathieu Desnoyers
  2007-10-31 18:12       ` Mathieu Desnoyers
  2007-11-16 19:13     ` David Smith
  1 sibling, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-10-31 16:58 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* David Smith (dsmith@redhat.com) wrote:
> Mathieu,
> 
> Thanks for the pointers, I'll take a look at your patches.

Please get the latest version then :

http://ltt.polymtl.ca/lttng/patch-2.6.23-mm1-lttng-0.10-pre11.tar.bz2

The patches I have are :

(no patch header yet)
#kernel trace thread flag
lttng-kernel-trace-thread-flag-alpha.patch
lttng-kernel-trace-thread-flag-arm.patch
lttng-kernel-trace-thread-flag-avr32.patch
lttng-kernel-trace-thread-flag-blackfin.patch
lttng-kernel-trace-thread-flag-cris.patch
lttng-kernel-trace-thread-flag-frv.patch
lttng-kernel-trace-thread-flag-h8300.patch
lttng-kernel-trace-thread-flag-i386.patch
lttng-kernel-trace-thread-flag-ia64.patch #FIXME
lttng-kernel-trace-thread-flag-m32r.patch
lttng-kernel-trace-thread-flag-m68k.patch
lttng-kernel-trace-thread-flag-m68knommu.patch
lttng-kernel-trace-thread-flag-mips.patch
lttng-kernel-trace-thread-flag-parisc.patch
lttng-kernel-trace-thread-flag-powerpc.patch
lttng-kernel-trace-thread-flag-s390.patch #FIXME
lttng-kernel-trace-thread-flag-sh.patch
lttng-kernel-trace-thread-flag-sh64.patch
lttng-kernel-trace-thread-flag-sparc.patch
lttng-kernel-trace-thread-flag-sparc64.patch
lttng-kernel-trace-thread-flag-um.patch
lttng-kernel-trace-thread-flag-v850.patch
fix-x86_64-sysenter-trace-race.patch
lttng-kernel-trace-thread-flag-x86_64.patch
lttng-kernel-trace-thread-flag-xtensa.patch
lttng-kernel-trace-thread-flag-api.patch

(with patch headers)
lttng-instrument-kernelh.patch # NOT FOR UPSTREAM
#
lttng-instrumentation-fs.patch
lttng-instrumentation-ipc.patch
lttng-instrumentation-kernel.patch
lttng-instrumentation-mm.patch
lttng-instrumentation-net.patch

And I could wait before submitting the arch specific patches I have (or
not?).  What do you think ?

lttng-instrumentation-arm.patch
lttng-instrumentation-i386.patch
lttng-instrumentation-mips.patch
lttng-instrumentation-powerpc.patch
lttng-instrumentation-ppc.patch
lttng-instrumentation-sh.patch
lttng-instrumentation-sh64.patch
lttng-instrumentation-sparc.patch
lttng-instrumentation-x86_64.patch
lttng-instrumentation-s390.patch        #FIXME: syscall trace 8 bit.

There seems to be some interest on LKML for me to submit those patches.
Early feedback would be appreciated.

Mathieu

> 
> Mathieu Desnoyers wrote:
> >* David Smith (dsmith@redhat.com) wrote:
> >>Mathieu,
> >>
> >>Now that the markers facility itself has made it in the kernel, do you 
> >>have plans on trying to send patches that actually use markers to lkml?
> >>
> >>For systemtap's use, we'd like to get some actual markers in the 
> >>upstream kernel.  Off the top of my head, we might start with adding 
> >>markers to system calls (sys_*) that contain the system call's 
> >>argument(s).
> >>
> >
> >Hi David,
> >
> >Yes, we have something similar in LTTng, we instrument many widely used
> >system calls to get the detailed arguments.
> >
> >Do you want to start having a look at my instrumentation patchset ?
> >Those are the 
> >lttng-instrumentation-*.patch patches available in the following
> >tarball:
> >
> >http://ltt.polymtl.ca/lttng/patch-2.6.23-mm1-lttng-0.10-pre8.tar.bz2
> >
> >The patches that you may find interesting to comment are :
> >
> >lttng-kernel-trace-thread-flag-*
> >  These patches adds a thread flag for kernel wide syscall trace
> >  activation.
> >  Note that I would gladly accept some help with the
> >    lttng-kernel-trace-thread-flag-ia64.patch
> >    lttng-kernel-trace-thread-flag-s390.patch
> >    lttng-instrumentation-s390.patch
> >  They need to add the 9th bit of thread flag that has to be checked by
> >  a 8 bit limited instruction on these architectures.
> >
> >lttng-instrumentation-*
> >  Actual markers. It also includes assembly code change to use the
> >  thread flags for syscall_trace.
> >  Some architectures do not have complete architecture specific marker
> >  set complete.
> >
> >It's a good thing that we start having a discussion about these marker
> >sites at this point.
> >
> >Mathieu
> >
> 
> 
> -- 
> David Smith
> dsmith@redhat.com
> Red Hat
> http://www.redhat.com
> 256.217.0141 (direct)
> 256.837.0057 (fax)

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-10-31 16:58     ` Mathieu Desnoyers
@ 2007-10-31 18:12       ` Mathieu Desnoyers
  2007-11-01  3:34         ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-10-31 18:12 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * David Smith (dsmith@redhat.com) wrote:
> > Mathieu,
> > 
> > Thanks for the pointers, I'll take a look at your patches.
> 
> Please get the latest version then :
> 
> http://ltt.polymtl.ca/lttng/patch-2.6.23-mm1-lttng-0.10-pre11.tar.bz2
> 
> The patches I have are :
> 
> (no patch header yet)

Supplementary note : the "kernel trace thread flag" patches are only
useful for the architecture-specific syscall entry/exit tracing, and
therefore not required for the architecture independent markup.

So the minimal patchset would be:

lttng-instrument-kernelh.patch # NOT FOR UPSTREAM
#
lttng-instrumentation-fs.patch
lttng-instrumentation-ipc.patch
lttng-instrumentation-kernel.patch
lttng-instrumentation-mm.patch
lttng-instrumentation-net.patch

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-10-31 18:12       ` Mathieu Desnoyers
@ 2007-11-01  3:34         ` Mathieu Desnoyers
  0 siblings, 0 replies; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-01  3:34 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* Mathieu Desnoyers (compudj@krystal.dyndns.org) wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> > * David Smith (dsmith@redhat.com) wrote:
> > > Mathieu,
> > > 
> > > Thanks for the pointers, I'll take a look at your patches.
> > 

New version :

http://ltt.polymtl.ca/lttng/patch-2.6.23-mm1-lttng-0.10-pre12.tar.bz2

- Reordered the series files to have the patches with earlier submission
  on top.
- Ran the core LTTng files through checkpatch.pl. Removed dead code.
  (did not run architecture specific instrumentation through
  checkpatch.pl)

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-10-31 16:29   ` David Smith
  2007-10-31 16:58     ` Mathieu Desnoyers
@ 2007-11-16 19:13     ` David Smith
  2007-11-16 19:24       ` Mathieu Desnoyers
  1 sibling, 1 reply; 20+ messages in thread
From: David Smith @ 2007-11-16 19:13 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

David Smith wrote:
> Mathieu Desnoyers wrote:
>> * David Smith (dsmith@redhat.com) wrote:
>>> Mathieu,
>>>
>>> Now that the markers facility itself has made it in the kernel, do
>>> you have plans on trying to send patches that actually use markers to
>>> lkml?
>>>
>>> For systemtap's use, we'd like to get some actual markers in the
>>> upstream kernel.  Off the top of my head, we might start with adding
>>> markers to system calls (sys_*) that contain the system call's
>>> argument(s).
>>>
>>
>> Hi David,
>>
>> Yes, we have something similar in LTTng, we instrument many widely used
>> system calls to get the detailed arguments.

...

>> It's a good thing that we start having a discussion about these marker
>> sites at this point.
>>
>> Mathieu

I've been looking at your system call tracing patches.  (I've tried
running lttv itself without much luck, but it doesn't really matter for
the sake of this discussion.)

I like the way you use the existing system call tracing points.  So
we're on the same page, here are the markers I'm seeing in
arch/x86/kernel/ptrace32.c after applying
patch-2.6.24-rc2-lttng-0.10-pre23.tar.bz2:

  trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
			(int)regs->orig_eax, instruction_pointer(regs));

  trace_mark(kernel_arch_syscall_exit, MARK_NOARGS);

For systemtap use, we'd like to have more information than that.  On
syscall entry, we'd like be able to get the arguments,  On syscall exit,
we'd like the to be able to get the return value.  In fact, the easiest
thing would be to supply the same information that audit_syscall_entry()
and audit_syscall_exit() need.

Since I'll bet you've already considered this, I'd like to know why you
decided to go a different way.

Thanks.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 19:13     ` David Smith
@ 2007-11-16 19:24       ` Mathieu Desnoyers
  2007-11-16 19:56         ` Frank Ch. Eigler
  2007-11-20 15:22         ` David Smith
  0 siblings, 2 replies; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-16 19:24 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* David Smith (dsmith@redhat.com) wrote:
> David Smith wrote:
> > Mathieu Desnoyers wrote:
> >> * David Smith (dsmith@redhat.com) wrote:
> >>> Mathieu,
> >>>
> >>> Now that the markers facility itself has made it in the kernel, do
> >>> you have plans on trying to send patches that actually use markers to
> >>> lkml?
> >>>
> >>> For systemtap's use, we'd like to get some actual markers in the
> >>> upstream kernel.  Off the top of my head, we might start with adding
> >>> markers to system calls (sys_*) that contain the system call's
> >>> argument(s).
> >>>
> >>
> >> Hi David,
> >>
> >> Yes, we have something similar in LTTng, we instrument many widely used
> >> system calls to get the detailed arguments.
> 
> ...
> 
> >> It's a good thing that we start having a discussion about these marker
> >> sites at this point.
> >>
> >> Mathieu
> 
> I've been looking at your system call tracing patches.  (I've tried
> running lttv itself without much luck, but it doesn't really matter for
> the sake of this discussion.)
> 
> I like the way you use the existing system call tracing points.  So
> we're on the same page, here are the markers I'm seeing in
> arch/x86/kernel/ptrace32.c after applying
> patch-2.6.24-rc2-lttng-0.10-pre23.tar.bz2:
> 
>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
> 			(int)regs->orig_eax, instruction_pointer(regs));
> 
>   trace_mark(kernel_arch_syscall_exit, MARK_NOARGS);
> 
> For systemtap use, we'd like to have more information than that.  On
> syscall entry, we'd like be able to get the arguments,  On syscall exit,
> we'd like the to be able to get the return value.  In fact, the easiest
> thing would be to supply the same information that audit_syscall_entry()
> and audit_syscall_exit() need.
> 
> Since I'll bet you've already considered this, I'd like to know why you
> decided to go a different way.
> 

Well, the approach taken was to instrument each important system call in
the syscall specific function to be able to actually know what type of
information to record. For instance, if ebx points to a string, the
pointer is not very useful, but the string is.

You have a good point for the syscall exit instrumentation : adding the
return value is trivial and would be very useful.

Could we do better ?

> Thanks.
> 
> -- 
> David Smith
> dsmith@redhat.com
> Red Hat
> http://www.redhat.com
> 256.217.0141 (direct)
> 256.837.0057 (fax)

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 19:24       ` Mathieu Desnoyers
@ 2007-11-16 19:56         ` Frank Ch. Eigler
  2007-11-16 20:10           ` Mathieu Desnoyers
  2007-11-20 15:22         ` David Smith
  1 sibling, 1 reply; 20+ messages in thread
From: Frank Ch. Eigler @ 2007-11-16 19:56 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: David Smith, ltt-dev, Systemtap List

Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> writes:

> [...]
>>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
>> 			(int)regs->orig_eax, instruction_pointer(regs));
>> [...]
>> For systemtap use, we'd like to have more information than that.  On
>> syscall entry, we'd like be able to get the arguments, [...]
>
> Well, the approach taken was to instrument each important system call in
> the syscall specific function to be able to actually know what type of
> information to record. For instance, if ebx points to a string, the
> pointer is not very useful, but the string is.

How would this syscall specific function get ebx or the string,
without ebx (or regs) being passed as marker arguments?

For systemtap's purposes, we're prepared to have per-systemcall logic
to decode arguments further, in particular to extract user-space
strings.  But we need to know where to look for them!


> You have a good point for the syscall exit instrumentation : adding the
> return value is trivial and would be very useful.

(And an errno code if it's separate.)


- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 19:56         ` Frank Ch. Eigler
@ 2007-11-16 20:10           ` Mathieu Desnoyers
  2007-11-16 20:27             ` Frank Ch. Eigler
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-16 20:10 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: David Smith, ltt-dev, Systemtap List

* Frank Ch. Eigler (fche@redhat.com) wrote:
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> writes:
> 
> > [...]
> >>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
> >> 			(int)regs->orig_eax, instruction_pointer(regs));
> >> [...]
> >> For systemtap use, we'd like to have more information than that.  On
> >> syscall entry, we'd like be able to get the arguments, [...]
> >
> > Well, the approach taken was to instrument each important system call in
> > the syscall specific function to be able to actually know what type of
> > information to record. For instance, if ebx points to a string, the
> > pointer is not very useful, but the string is.
> 
> How would this syscall specific function get ebx or the string,
> without ebx (or regs) being passed as marker arguments?
> 
That's the idea : in the syscall specific function (not in
syscall_trace()), we add another marker that takes the syscall specific
arguments as parameter. I think we use the same approach there.

What I was saying is that we can't extract the string from
syscall_trace() because we have no idea it is a string.


> For systemtap's purposes, we're prepared to have per-systemcall logic
> to decode arguments further, in particular to extract user-space
> strings.  But we need to know where to look for them!
> 
> 
> > You have a good point for the syscall exit instrumentation : adding the
> > return value is trivial and would be very useful.
> 
> (And an errno code if it's separate.)
> 

AFAIK, the errno code is generated by the userspace libraries using the
return value of the syscall, so it would be the same.

Mathieu

> 
> - FChE

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 20:10           ` Mathieu Desnoyers
@ 2007-11-16 20:27             ` Frank Ch. Eigler
  2007-11-16 20:35               ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Ch. Eigler @ 2007-11-16 20:27 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Hi -

On Fri, Nov 16, 2007 at 03:10:15PM -0500, Mathieu Desnoyers wrote:
> [...]
> > How would this syscall specific function get ebx or the string,
> > without ebx (or regs) being passed as marker arguments?

> That's the idea : in the syscall specific function (not in
> syscall_trace()), we add another marker that takes the syscall
> specific arguments as parameter. I think we use the same approach
> there.

I see.  Yes, per-systemcall markers would be welcome by our group, and
ones not dependent on TIF_TRACE or whatnot even more so.  But were
trying not to get too optimistic.


> What I was saying is that we can't extract the string from
> syscall_trace() because we have no idea it is a string.

If "we" is a marker callback function that is given the system call
number, it can be taught.  This is the sort of thing we do currently
in systemtap script code based upon kprobes.

- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 20:27             ` Frank Ch. Eigler
@ 2007-11-16 20:35               ` Mathieu Desnoyers
  2007-11-16 20:43                 ` Frank Ch. Eigler
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-16 20:35 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ltt-dev, Systemtap List

* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
> 
> On Fri, Nov 16, 2007 at 03:10:15PM -0500, Mathieu Desnoyers wrote:
> > [...]
> > > How would this syscall specific function get ebx or the string,
> > > without ebx (or regs) being passed as marker arguments?
> 
> > That's the idea : in the syscall specific function (not in
> > syscall_trace()), we add another marker that takes the syscall
> > specific arguments as parameter. I think we use the same approach
> > there.
> 
> I see.  Yes, per-systemcall markers would be welcome by our group, and
> ones not dependent on TIF_TRACE or whatnot even more so.  But were
> trying not to get too optimistic.
> 

I use per-systemcall markers for the principally useful systemcalls, but
I also instrument syscall_trace() to get all the other syscalls (new
ones, etc..).

I add my own TIF_KERNEL_TRACE, which is a thread flag enabled in each
and every thread when tracing is active. I think both have their own
advantage (complete information vs instrumentation of every, even less
important, system calls).

> 
> > What I was saying is that we can't extract the string from
> > syscall_trace() because we have no idea it is a string.
> 
> If "we" is a marker callback function that is given the system call
> number, it can be taught.  This is the sort of thing we do currently
> in systemtap script code based upon kprobes.
> 

Yeah.. but I fear that within the kernel it can become quickly very
ugly.

Mathieu

> - FChE

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 20:35               ` Mathieu Desnoyers
@ 2007-11-16 20:43                 ` Frank Ch. Eigler
  2007-11-16 22:03                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Ch. Eigler @ 2007-11-16 20:43 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Hi -

On Fri, Nov 16, 2007 at 03:35:39PM -0500, Mathieu Desnoyers wrote:
> [...]
> > I see.  Yes, per-systemcall markers would be welcome by our group, and
> > ones not dependent on TIF_TRACE or whatnot even more so.  But were
> > trying not to get too optimistic.
> 
> I use per-systemcall markers for the principally useful systemcalls, but
> I also instrument syscall_trace() to get all the other syscalls (new
> ones, etc..).

So then some system calls would get duplicate trace reports, and some
would not get arguments at all?  Does not sound ideal.

> I add my own TIF_KERNEL_TRACE, which is a thread flag enabled in
> each and every thread when tracing is active.  [...]

Who has responsibility to manage this flag?  Would it be reference
counted, so that e.g.  two ltt and a third systemtap script all hook
up to these markers, the flag will will stay set?  It would be nice to
measure the impact of ordinary, unconditional markers in the
system-call functions.

> > If "we" is a marker callback function that is given the system call
> > number, it can be taught.  This is the sort of thing we do currently
> > in systemtap script code based upon kprobes.
> 
> Yeah.. but I fear that within the kernel it can become quickly very
> ugly.

It's an inherent tradeoff between a small generic hook versus many
specialized hooks.  Look how the audit system deals with decoding
syscalls.  It's not THAT bad.

- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 20:43                 ` Frank Ch. Eigler
@ 2007-11-16 22:03                   ` Mathieu Desnoyers
  2007-11-16 22:35                     ` Frank Ch. Eigler
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-16 22:03 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ltt-dev, Systemtap List

* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
> 
> On Fri, Nov 16, 2007 at 03:35:39PM -0500, Mathieu Desnoyers wrote:
> > [...]
> > > I see.  Yes, per-systemcall markers would be welcome by our group, and
> > > ones not dependent on TIF_TRACE or whatnot even more so.  But were
> > > trying not to get too optimistic.
> > 
> > I use per-systemcall markers for the principally useful systemcalls, but
> > I also instrument syscall_trace() to get all the other syscalls (new
> > ones, etc..).
> 
> So then some system calls would get duplicate trace reports, and some
> would not get arguments at all?  Does not sound ideal.
> 

We currently have three distinct events for a system call :

syscall entry, with syscall id and instruction pointer
the syscall specific instrumentation (opt)
syscall exit

One of the benefit to have syscall entry/exit with minimal information
is that we can put them really close to the "real" event, i.e. : passing
from userspace to kernel space. It becomes useful when people want a
precise accounting of the kernel vs userspace time. Therefore, the
results will be as close as possible to results taken by a profiler.

Having limited information passed to the syscall entry/exit
instrumentation helps knowing the number of cycles wrongly accounted. We
do not currently alter the statistics to take that into account, but we
plan to do this in the future. Having anything complicated could cause
the number of cycles wrongly accounted to vary between each event, which
is unwanted.

Instrumentation within the syscall specific function helps knowing
when/if the operation has really been done _within the kernel_. It may
imply putting the event within the bounds of existing locks to be as
sure as possible two related events happening on different CPUs won't be
in the wrong order. Ideally, the instrumentation of the syscall "effect
on the internal data structures of the kernel" should be as close as
possible to the actual memory modification.

Given these two opposite sets of constraints, I think having more than
one instrumentation site per syscall makes sense. Moreover, markers are
really cheap... :)

> > I add my own TIF_KERNEL_TRACE, which is a thread flag enabled in
> > each and every thread when tracing is active.  [...]
> 
> Who has responsibility to manage this flag?  Would it be reference
> counted, so that e.g.  two ltt and a third systemtap script all hook
> up to these markers, the flag will will stay set?  It would be nice to
> measure the impact of ordinary, unconditional markers in the
> system-call functions.
> 

Already did. Inactive markers, with high memory pressure, we must do 2
memory reads (that's the cycles difference we get). If they are in
cache, it's hard to see a difference. I think I've documented that in
the markers or immediate values patch header.

For active markers, I did some testing a while ago.. I could dig the ML
to find these results.

Yes, refcount would be the way to go. The code is currently in
kernel/sched.c, since it touches the threads. I would have to add the
refcount. It will be in the next LTTng prerelease.

> > > If "we" is a marker callback function that is given the system call
> > > number, it can be taught.  This is the sort of thing we do currently
> > > in systemtap script code based upon kprobes.
> > 
> > Yeah.. but I fear that within the kernel it can become quickly very
> > ugly.
> 
> It's an inherent tradeoff between a small generic hook versus many
> specialized hooks.  Look how the audit system deals with decoding
> syscalls.  It's not THAT bad.
> 

Hrm, it's just that it centralizes something that would be good to leave
to each subsystem's expert, which is what information specific to a given
system call is interesting and when is the best moment to record it.
Just like I would leave to the architecture experts the final word on
when it's best to record the system call entry/exit event.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 22:03                   ` Mathieu Desnoyers
@ 2007-11-16 22:35                     ` Frank Ch. Eigler
  2007-11-20 16:50                       ` Mathieu Desnoyers
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Ch. Eigler @ 2007-11-16 22:35 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Hi -

On Fri, Nov 16, 2007 at 05:03:14PM -0500, Mathieu Desnoyers wrote:
> [...]
> We currently have three distinct events for a system call :
> 
> syscall entry, with syscall id and instruction pointer
> the syscall specific instrumentation (opt)
> syscall exit

> [...]  Instrumentation within the syscall specific function helps
> knowing when/if the operation has really been done _within the
> kernel_. [...]

Not just that - but *what* the actual operation was.

> [...] Given these two opposite sets of constraints, I think having
> more than one instrumentation site per syscall makes sense.

Sure - what bothers me is the satisfaction with the inconsistency of
some system calls having no specific markers.

> Moreover, markers are really cheap... :)

I'm not the one who must buy what we're selling - it's the kernel
maintainers. :-)

> [...]  Yes, refcount would be the way to go. The code is currently
> in kernel/sched.c, since it touches the threads. I would have to add
> the refcount. It will be in the next LTTng prerelease.

But you see, if markers are not just really cheap but really really
cheap, then you don't need the task flag, nor the new API for
refcounting the flags' clients, nor the new machinery to propagate the
flag to new tasks.  You just put unconditional markers in there and
let the possible multiple marker handlers do their own filtering.


- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 19:24       ` Mathieu Desnoyers
  2007-11-16 19:56         ` Frank Ch. Eigler
@ 2007-11-20 15:22         ` David Smith
  2007-11-20 16:22           ` [Ltt-dev] " Mathieu Desnoyers
  2007-11-20 18:46           ` Mathieu Desnoyers
  1 sibling, 2 replies; 20+ messages in thread
From: David Smith @ 2007-11-20 15:22 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Mathieu Desnoyers wrote:
> * David Smith (dsmith@redhat.com) wrote:
>>>> Mathieu
>> I've been looking at your system call tracing patches.  (I've tried
>> running lttv itself without much luck, but it doesn't really matter for
>> the sake of this discussion.)
>>
>> I like the way you use the existing system call tracing points.  So
>> we're on the same page, here are the markers I'm seeing in
>> arch/x86/kernel/ptrace32.c after applying
>> patch-2.6.24-rc2-lttng-0.10-pre23.tar.bz2:
>>
>>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
>> 			(int)regs->orig_eax, instruction_pointer(regs));
>>
>>   trace_mark(kernel_arch_syscall_exit, MARK_NOARGS);
>>
>> For systemtap use, we'd like to have more information than that.  On
>> syscall entry, we'd like be able to get the arguments,  On syscall exit,
>> we'd like the to be able to get the return value.  In fact, the easiest
>> thing would be to supply the same information that audit_syscall_entry()
>> and audit_syscall_exit() need.
>>
>> Since I'll bet you've already considered this, I'd like to know why you
>> decided to go a different way.
>>
> Well, the approach taken was to instrument each important system call in
> the syscall specific function to be able to actually know what type of
> information to record. For instance, if ebx points to a string, the
> pointer is not very useful, but the string is.

That is (somewhat) true in the case of strings.

But, similar problems exist with syscalls that take structure pointers:
sys_[gs]ettimeofday, sys_adjtimex, sys_times, sys_nanosleep,
sys_[gs]etitimer, sys_timer_create, sys_timer_[gs]ettime,
sys_clock_gettime, sys_clock_getres, sys_clock_nanosleep,
sys_sched_setscheduler, sys_sched_[gs]etparam, sys_wait4, sys_waitid,
sys_rt_sigtimedwait, sys_stat, sys_statfs[64], sys_fstatfs[64],
sys_lstat, sys_fstat, and so on (I got tired of looking through syscalls.h).

For those syscalls only a pointer can be passed so the marker handler
will have to know how to handle that pointer.  That marker handler will
need to know that that value is a pointer to a particular structure type
and then know how to access it accordingly.

The same could be done for strings.  Is it a little more work?  Yes.  Is
it fairly easy?  Yes.

Let me ask the question another way.  Is there a (measurable)
performance hit if the extra arguments to the syscall entry marker are
added?  If not, even if lttng doesn't plan to use them, why not add
them?  Certainly systemtap (and perhaps other users) could use them.

> You have a good point for the syscall exit instrumentation : adding the
> return value is trivial and would be very useful.

I'm glad we agree that adding the return value is useful and trivial.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ltt-dev] patches to actually use markers?
  2007-11-20 15:22         ` David Smith
@ 2007-11-20 16:22           ` Mathieu Desnoyers
  2007-11-20 20:43             ` David Smith
  2007-11-20 18:46           ` Mathieu Desnoyers
  1 sibling, 1 reply; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-20 16:22 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* David Smith (dsmith@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> > * David Smith (dsmith@redhat.com) wrote:
> >>>> Mathieu
> >> I've been looking at your system call tracing patches.  (I've tried
> >> running lttv itself without much luck, but it doesn't really matter for
> >> the sake of this discussion.)
> >>
> >> I like the way you use the existing system call tracing points.  So
> >> we're on the same page, here are the markers I'm seeing in
> >> arch/x86/kernel/ptrace32.c after applying
> >> patch-2.6.24-rc2-lttng-0.10-pre23.tar.bz2:
> >>
> >>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
> >> 			(int)regs->orig_eax, instruction_pointer(regs));
> >>
> >>   trace_mark(kernel_arch_syscall_exit, MARK_NOARGS);
> >>
> >> For systemtap use, we'd like to have more information than that.  On
> >> syscall entry, we'd like be able to get the arguments,  On syscall exit,
> >> we'd like the to be able to get the return value.  In fact, the easiest
> >> thing would be to supply the same information that audit_syscall_entry()
> >> and audit_syscall_exit() need.
> >>
> >> Since I'll bet you've already considered this, I'd like to know why you
> >> decided to go a different way.
> >>
> > Well, the approach taken was to instrument each important system call in
> > the syscall specific function to be able to actually know what type of
> > information to record. For instance, if ebx points to a string, the
> > pointer is not very useful, but the string is.
> 
> That is (somewhat) true in the case of strings.
> 
> But, similar problems exist with syscalls that take structure pointers:
> sys_[gs]ettimeofday, sys_adjtimex, sys_times, sys_nanosleep,
> sys_[gs]etitimer, sys_timer_create, sys_timer_[gs]ettime,
> sys_clock_gettime, sys_clock_getres, sys_clock_nanosleep,
> sys_sched_setscheduler, sys_sched_[gs]etparam, sys_wait4, sys_waitid,
> sys_rt_sigtimedwait, sys_stat, sys_statfs[64], sys_fstatfs[64],
> sys_lstat, sys_fstat, and so on (I got tired of looking through syscalls.h).
> 
> For those syscalls only a pointer can be passed so the marker handler
> will have to know how to handle that pointer.  That marker handler will
> need to know that that value is a pointer to a particular structure type
> and then know how to access it accordingly.
> 
> The same could be done for strings.  Is it a little more work?  Yes.  Is
> it fairly easy?  Yes.
> 
> Let me ask the question another way.  Is there a (measurable)
> performance hit if the extra arguments to the syscall entry marker are
> added?  If not, even if lttng doesn't plan to use them, why not add
> them?  Certainly systemtap (and perhaps other users) could use them.
> 

Yup, I'd be all in for flexibility, and the performance impact should be
small. I just wonder if the best approach is to pass the pt_regs pointer
as a marker argument or to pass the individual registers.

Since the LTTng serializer uses the format string to generically take
the arguments and write them in a trace, I doubt that writing a pt_regs
pointer is really useful. On the other hand, passing all the individual
registers would imply a stack setup cost at runtime (small cost though),
but would provide somewhat meaningful information in the traces (but
redundant if we instrument the in-kernel functions).

Both approaches would let specific probes deal with the syscall
arguments as they like.

If we choose to go for the pt_regs pointer passing solution, we could
add a format string extension to specify that a given argument should
not be written in the trace. If we pass the pt_regs like this :

  trace_mark(syscall_entry, "syscall_id %lu ip %p pt_regs #0%p",
    regs->eax, instruction_pointer(regs), regs);

A LTTng probe would know that the #0 (# is a prefix to the format
string element that tells LTTng what type size and format to use in the
trace, independent of the size used on the gcc side) means that the data
should be discarded from the trace.

My goal is still that adding instrumentation should be as easy as
possible in the general case, while permitting flexibility for custom
probes. Therefore, I'd prefer not to _require_ the implementation of
a syscall audit-like set of per-architecture probes, but I'd like to
leave room to implement one.

Mathieu

> > You have a good point for the syscall exit instrumentation : adding the
> > return value is trivial and would be very useful.
> 
> I'm glad we agree that adding the return value is useful and trivial.
> 
> -- 
> David Smith
> dsmith@redhat.com
> Red Hat
> http://www.redhat.com
> 256.217.0141 (direct)
> 256.837.0057 (fax)
> _______________________________________________
> Ltt-dev mailing list
> Ltt-dev@listserv.shafik.org
> http://listserv.shafik.org/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-16 22:35                     ` Frank Ch. Eigler
@ 2007-11-20 16:50                       ` Mathieu Desnoyers
  0 siblings, 0 replies; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-20 16:50 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ltt-dev, Systemtap List

* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
> 
> On Fri, Nov 16, 2007 at 05:03:14PM -0500, Mathieu Desnoyers wrote:
> > [...]
> > We currently have three distinct events for a system call :
> > 
> > syscall entry, with syscall id and instruction pointer
> > the syscall specific instrumentation (opt)
> > syscall exit
> 
> > [...]  Instrumentation within the syscall specific function helps
> > knowing when/if the operation has really been done _within the
> > kernel_. [...]
> 
> Not just that - but *what* the actual operation was.
> 

One could argue that by saving the syscall parameters, we could probably
remove some internal kernel instrumentation because it would duplicate
the information.

> > [...] Given these two opposite sets of constraints, I think having
> > more than one instrumentation site per syscall makes sense.
> 
> Sure - what bothers me is the satisfaction with the inconsistency of
> some system calls having no specific markers.
> 

I don't see much difference between the two approaches : if you create
a syscall audit-like probe module, you will have to deal with each
architecture and describe each system externally. Therefore, there is a
per system call / per architecture action required from the developers.
On the other hand, if we instrument the functions called by these system
calls, which are often in architecture independent code, we only have to
add instrumentation for each system call, (removing the "per
architecture" multiplicator).

We also come back to the distinction between maintaining a list of
system calls outside the actual kernel code base, making it harder to
follow the code flow, vs adding the instrumentation directly in the
kernel code.

Even if the probes are packaged with the kernel, adding a new system
call will require to go change yet another list that would keep track of
the system calls for each architecture.

> > Moreover, markers are really cheap... :)
> 
> I'm not the one who must buy what we're selling - it's the kernel
> maintainers. :-)
> 
> > [...]  Yes, refcount would be the way to go. The code is currently
> > in kernel/sched.c, since it touches the threads. I would have to add
> > the refcount. It will be in the next LTTng prerelease.
> 
> But you see, if markers are not just really cheap but really really
> cheap, then you don't need the task flag, nor the new API for
> refcounting the flags' clients, nor the new machinery to propagate the
> flag to new tasks.  You just put unconditional markers in there and
> let the possible multiple marker handlers do their own filtering.
> 

In per-architecture _assembly_ code ??

Note that the markers are C macros. We put them in syscall_trace, which
is a C callback made exactly for this kind of purpose. However, there is
a performance penality involved in setting up the stack and calling this
function, so we have to control wether or not it should be called. We
are not even talking about marker code there.

Mathieu

> 
> - FChE

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patches to actually use markers?
  2007-11-20 15:22         ` David Smith
  2007-11-20 16:22           ` [Ltt-dev] " Mathieu Desnoyers
@ 2007-11-20 18:46           ` Mathieu Desnoyers
  1 sibling, 0 replies; 20+ messages in thread
From: Mathieu Desnoyers @ 2007-11-20 18:46 UTC (permalink / raw)
  To: David Smith; +Cc: ltt-dev, Systemtap List

* David Smith (dsmith@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> > You have a good point for the syscall exit instrumentation : adding the
> > return value is trivial and would be very useful.
> 
> I'm glad we agree that adding the return value is useful and trivial.
> 

It is trivial in cases where it is already used by syscall audit.
However, on architectures not implementing syscall audit, we cannot be
sure it's been written on the stack by the assembly code, since it's
never used (and this code is often heavily optimized).

So I can add the return values for architectures supporting syscall
audit quickly, but a thogough analysis for each other architectures will
be required before we proceed.

Mathieu

> -- 
> David Smith
> dsmith@redhat.com
> Red Hat
> http://www.redhat.com
> 256.217.0141 (direct)
> 256.837.0057 (fax)

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ltt-dev] patches to actually use markers?
  2007-11-20 16:22           ` [Ltt-dev] " Mathieu Desnoyers
@ 2007-11-20 20:43             ` David Smith
  0 siblings, 0 replies; 20+ messages in thread
From: David Smith @ 2007-11-20 20:43 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: ltt-dev, Systemtap List

Mathieu Desnoyers wrote:
> * David Smith (dsmith@redhat.com) wrote:
>> Mathieu Desnoyers wrote:
>>> * David Smith (dsmith@redhat.com) wrote:
>>>>>> Mathieu
>>>> I've been looking at your system call tracing patches.  (I've tried
>>>> running lttv itself without much luck, but it doesn't really matter for
>>>> the sake of this discussion.)
>>>>
>>>> I like the way you use the existing system call tracing points.  So
>>>> we're on the same page, here are the markers I'm seeing in
>>>> arch/x86/kernel/ptrace32.c after applying
>>>> patch-2.6.24-rc2-lttng-0.10-pre23.tar.bz2:
>>>>
>>>>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
>>>> 			(int)regs->orig_eax, instruction_pointer(regs));
>>>>
>>>>   trace_mark(kernel_arch_syscall_exit, MARK_NOARGS);
>>>>
>>>> For systemtap use, we'd like to have more information than that.  On
>>>> syscall entry, we'd like be able to get the arguments,  On syscall exit,
>>>> we'd like the to be able to get the return value.  In fact, the easiest
>>>> thing would be to supply the same information that audit_syscall_entry()
>>>> and audit_syscall_exit() need.
>>>>
>>>> Since I'll bet you've already considered this, I'd like to know why you
>>>> decided to go a different way.
>>>>
>>> Well, the approach taken was to instrument each important system call in
>>> the syscall specific function to be able to actually know what type of
>>> information to record. For instance, if ebx points to a string, the
>>> pointer is not very useful, but the string is.
>> That is (somewhat) true in the case of strings.
>>
>> But, similar problems exist with syscalls that take structure pointers:
>> sys_[gs]ettimeofday, sys_adjtimex, sys_times, sys_nanosleep,
>> sys_[gs]etitimer, sys_timer_create, sys_timer_[gs]ettime,
>> sys_clock_gettime, sys_clock_getres, sys_clock_nanosleep,
>> sys_sched_setscheduler, sys_sched_[gs]etparam, sys_wait4, sys_waitid,
>> sys_rt_sigtimedwait, sys_stat, sys_statfs[64], sys_fstatfs[64],
>> sys_lstat, sys_fstat, and so on (I got tired of looking through syscalls.h).
>>
>> For those syscalls only a pointer can be passed so the marker handler
>> will have to know how to handle that pointer.  That marker handler will
>> need to know that that value is a pointer to a particular structure type
>> and then know how to access it accordingly.
>>
>> The same could be done for strings.  Is it a little more work?  Yes.  Is
>> it fairly easy?  Yes.
>>
>> Let me ask the question another way.  Is there a (measurable)
>> performance hit if the extra arguments to the syscall entry marker are
>> added?  If not, even if lttng doesn't plan to use them, why not add
>> them?  Certainly systemtap (and perhaps other users) could use them.
> 
> Yup, I'd be all in for flexibility, and the performance impact should be
> small. I just wonder if the best approach is to pass the pt_regs pointer
> as a marker argument or to pass the individual registers.

Systemtap would rather have the individual registers than the pt_regs
pointer, since then we don't have to worry about the architecture
details of which registers should contain the args.  Since the
syscall_entry markers are in architecture-specific code, let that code
worry about architecture-dependent details.

> Since the LTTng serializer uses the format string to generically take
> the arguments and write them in a trace, I doubt that writing a pt_regs
> pointer is really useful. On the other hand, passing all the individual
> registers would imply a stack setup cost at runtime (small cost though),
> but would provide somewhat meaningful information in the traces (but
> redundant if we instrument the in-kernel functions).
> 
> Both approaches would let specific probes deal with the syscall
> arguments as they like.
> 
> If we choose to go for the pt_regs pointer passing solution, we could
> add a format string extension to specify that a given argument should
> not be written in the trace. If we pass the pt_regs like this :
> 
>   trace_mark(syscall_entry, "syscall_id %lu ip %p pt_regs #0%p",
>     regs->eax, instruction_pointer(regs), regs);
> 
> A LTTng probe would know that the #0 (# is a prefix to the format
> string element that tells LTTng what type size and format to use in the
> trace, independent of the size used on the gcc side) means that the data
> should be discarded from the trace.

As far as systemtap is concerned, I don't really have much of an opinion
on the '#0' format specifier, since systemtap will never use it
(systemtap users never see the format string anyway) and I believe we'd
rather have the individual registers anyway.

I'd suggest holding off on the '#0' until it is really needed.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-11-20 20:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-29 19:26 patches to actually use markers? David Smith
2007-10-29 22:05 ` Mathieu Desnoyers
2007-10-31 16:29   ` David Smith
2007-10-31 16:58     ` Mathieu Desnoyers
2007-10-31 18:12       ` Mathieu Desnoyers
2007-11-01  3:34         ` Mathieu Desnoyers
2007-11-16 19:13     ` David Smith
2007-11-16 19:24       ` Mathieu Desnoyers
2007-11-16 19:56         ` Frank Ch. Eigler
2007-11-16 20:10           ` Mathieu Desnoyers
2007-11-16 20:27             ` Frank Ch. Eigler
2007-11-16 20:35               ` Mathieu Desnoyers
2007-11-16 20:43                 ` Frank Ch. Eigler
2007-11-16 22:03                   ` Mathieu Desnoyers
2007-11-16 22:35                     ` Frank Ch. Eigler
2007-11-20 16:50                       ` Mathieu Desnoyers
2007-11-20 15:22         ` David Smith
2007-11-20 16:22           ` [Ltt-dev] " Mathieu Desnoyers
2007-11-20 20:43             ` David Smith
2007-11-20 18:46           ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).