public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* RE: new syscall tapset checked in, but needs work
@ 2006-02-01 22:47 Stone, Joshua I
  2006-02-01 22:59 ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: Stone, Joshua I @ 2006-02-01 22:47 UTC (permalink / raw)
  To: Frank Ch. Eigler, Martin Hunt; +Cc: systemtap

Frank Ch. Eigler wrote:
> hunt wrote:
> 
>> [...]  I'm using an strace-like output format for the args, although
>> that really isn't possible for things like sys_read() where strace
>> shows the results that get written in the input buffer. It is
>> possible to write a systemtap script to do that, but not at the
>> tapset level.
> 
> If it is possible at the script level, it's possible at the tapset
> level (on purpose).  For instance, a tapset alias for syscall.read
> could store away its incoming args in an private array, until its
> .return (until bug #1382 is done).  All that code would be optimized
> away if not used.

Interesting idea - just to be explicit, is this what you had in mind?

# read _______________________________________________________
# ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
global __syscall_read_fd
global __syscall_read_buf
global __syscall_read_count
probe syscall.read = kernel.function("sys_read") {
      name = "read"
      __tid = tid()
      fd = __syscall_read_fd[__tid] = $fd
      buf_uaddr = __syscall_read_buf[__tid] = $buf
      count = __syscall_read_count[__tid] = $count
      argstr = sprintf("%d, ..., %d", fd, count)
}
probe syscall.read.return = kernel.function("sys_read").return {
      name = "read"
      __tid = tid()
      fd = __syscall_read_fd[__tid]
      buf_uaddr = __syscall_read_buf[__tid]
      count = __syscall_read_count[__tid]
      delete __syscall_read_fd[__tid]
      delete __syscall_read_buf[__tid]
      delete __syscall_read_count[__tid]
      # at this point we 
      argstr = sprintf("%d, ..., %d", fd, count)
      returnp = 1
}


Then person using the tapset would do something like:

probe syscall.read { /* capture arguments */ }
probe syscall.read.return {
    printf("read(%s) = %s\n", argstr, errno_str(returnval()))
}


This only works if there's only one instance of the return probe, else
the args will be deleted too soon.  If we don't delete the args though,
the map could fill quite easily.

By the way, I tried it without accessing the args, and the optimizer
didn't clean it up.
    probe syscall.read { /* capture arguments */ }
    probe syscall.read.return { }

I also notice this on testsuite/buildok/delete.stp - perhaps the
optimizer isn't handling delete statements?


Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: new syscall tapset checked in, but needs work
  2006-02-01 22:47 new syscall tapset checked in, but needs work Stone, Joshua I
@ 2006-02-01 22:59 ` Frank Ch. Eigler
  2006-02-02  0:15   ` Martin Hunt
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2006-02-01 22:59 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

Hi -

Josh wrote:
> [...]
> Interesting idea - just to be explicit, is this what you had in mind?

Yes.

> [...]
> Then person using the tapset would do something like:
> probe syscall.read { /* capture arguments */ }
> probe syscall.read.return {
>     printf("read(%s) =3D %s\n", argstr, errno_str(returnval()))
> }

It could be even smarter than that.  One can build a higher level
tapset on top of a lower level one: one specifically for tracing.
Tapset scripts are not constrained to only export functions or probe
aliases.  They can include active probes and exported global variables
too.

For example, a new "syscall-trace-read.stp" file could include both
those probes, and export a single dummy identifier (global, or
function) that, if referenced from an end-user script, would make the
translator pull in the whole file, and activate those probes.  (The
granularity of automatic inclusion is the .stp file, so a more clever
arrangement may be necessary to enable tracing without having to have
200-odd files.  Maybe grouping them?)


> This only works if there's only one instance of the return probe [...]

Yes, we need to make the maxactive parameters of e.g. systemcall
retprobes high enough to account for multiple threads blocking in one.


> By the way, I tried it without accessing the args, and the optimizer
> didn't clean it up. [...]  perhaps the optimizer isn't handling
> delete statements?

Yes, that's it.  An easy way around that for now would be to use an
ordinary assignment (=0 or ="") to clear the array slot, which renders
it released.

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: new syscall tapset checked in, but needs work
  2006-02-01 22:59 ` Frank Ch. Eigler
@ 2006-02-02  0:15   ` Martin Hunt
  2006-02-02  2:30     ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Hunt @ 2006-02-02  0:15 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Stone, Joshua I, systemtap

One reason you wouldn't want to only print the results of a syscall at
the return probe is because you might be mixing syscall probes with
other probes and you want to see what's happening between the time the
syscall is entered and the time it exits.  

As Frank mentioned, you can layer this stuff.  Using Josh's example and
mixing with some stuff I was playing with:

function user_strcpy_ascii:string (addr:long, n:long) %{
  long rc;
  long num = THIS->n;
  if (num > MAXSTRINGLEN)
        num = MAXSTRINGLEN;
  rc = _stp_copy_from_user (THIS->__retvalue,
                                    (const char __user*) (uintptr_t)
THIS->addr,
                                    num);
  {
	char *ptr = THIS->__retvalue;
	num -= rc; rc = num;
	while (num--) {
		if (*ptr & 128)
			*ptr = *ptr & 127;
		if (!isalnum(*ptr))
			*ptr = '.';
		ptr++;
	}
	ptr[rc] = 0;
   }
%}

global __syscall_read_buf
global __syscall_read_fd
global __syscall_read_count

probe strace.read.enter = syscall.read {
	__tid = tid()
      __syscall_read_fd[__tid] = fd
      __syscall_read_buf[__tid] = buf_uaddr
      __syscall_read_count[__tid] = count
      argstr = ""
}

probe strace.read.return = syscall.read.return {
      __tid = tid()
      fd = __syscall_read_fd[__tid]
      buf_uaddr = __syscall_read_buf[__tid]
      count = __syscall_read_count[__tid]
	if (count > 30)
		count = 30
     argstr = sprintf("%d, [%s], %d", fd, user_strcpy_ascii(buf_uaddr,
count), count)
      __syscall_read_fd[__tid] = 0
      __syscall_read_buf[__tid] = 0
      __syscall_read_count[__tid] = 0

}

probe strace.read = strace.read.enter, strace.read.return {}
probe strace.open = syscall.open, syscall.open.return {}

probe strace.* {
        if (pid() == target()) {
                if (returnp) {
			if (argstr != "")
				printf("%s(%s) = %d\n", name, argstr, returnval())
			else
				printf("%d\n", returnval())
                } else {
			if (argstr != "")
				printf("%s(%s) = ", name, argstr)
		}
        }
}
-------------------
That last part could be better.  Output looks like this:

> stap -c ps -g strace.stp
open("/etc/ld.so.cache", O_RDONLY) = 4
open("/lib/libtermcap.so.2", O_RDONLY) = 4
read(4, [.ELF....................`K#.4.], 30) = 512
open("/lib/libdl.so.2", O_RDONLY) = 4
read(4, [.ELF.....................L4.4.], 30) = 512
open("/lib/libc.so.6", O_RDONLY) = 4
read(4, [.ELF....................jU .4.], 30) = 512
open("/dev/tty", O_RDWR|O_NONBLOCK) = 4
open("/etc/mtab", O_RDONLY) = 4
read(4, [/dev/hdc1 / ext3 rw 0 0./dev/p], 30) = 372
open("/proc/meminfo", O_RDONLY) = 4
read(4, [MemTotal:      2067708 kB.MemF], 30) = 670
open("/etc/ld.so.cache", O_RDONLY) = 4
open("/lib/libproc-3.2.5.so", O_RDONLY) = 4
read(4, [.ELF....................PU".4.], 30) = 512
open("/lib/libc.so.6", O_RDONLY) = 4
read(4, [.ELF....................jU .4.], 30) = 512
open("/etc/mtab", O_RDONLY) = 4
read(4, [/dev/hdc1 / ext3 rw 0 0./dev/p], 30) = 372
open("/proc/stat", O_RDONLY) = 4
read(4, [cpu  138097 2320 33706 5096327], 30) = 745
read(4, [cpu  138097 2320 33706 5096327], 30) = 0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: new syscall tapset checked in, but needs work
  2006-02-02  0:15   ` Martin Hunt
@ 2006-02-02  2:30     ` Frank Ch. Eigler
  0 siblings, 0 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2006-02-02  2:30 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap


hunt wrote:

> [...]
>   rc = _stp_copy_from_user (THIS->__retvalue,
>                                     [...] THIS->addr,
>                                     num);

If, as appears from the runtime sources, this particular function is
not atomic, then it should not be called from an atomic probe handler.


Here is a more compact version of your final probe:

probe strace.* {
  if (pid() != target()) next
  if (returnp) {
     if (argstr != "") printf("%s(%s) = %d\n", name, argstr, returnval())
     else printf("%d\n", returnval())
  } else if (argstr != "")
     printf("%s(%s) = ", name, argstr)
}


> [...]Output looks like this:
> 
> > stap -c ps -g strace.stp
> open("/etc/ld.so.cache", O_RDONLY) = 4
> open("/lib/libtermcap.so.2", O_RDONLY) = 4
> read(4, [.ELF....................`K#.4.], 30) = 512
> [...]

Neato.


- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: new syscall tapset checked in, but needs work
  2006-02-01 20:31 Martin Hunt
  2006-02-01 21:11 ` Frank Ch. Eigler
@ 2006-02-01 21:45 ` Hien Nguyen
  1 sibling, 0 replies; 7+ messages in thread
From: Hien Nguyen @ 2006-02-01 21:45 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Martin Hunt wrote:

>I've checked in 4 files in stp/src/tapset
>
>I'm working on these
>syscalls.stp          (sys_[a-m]*)
>x86_64/syscalls.stp
>
>Hien, can you take these?
>syscalls2.stp         (sys_[n-z]*)
>i686/syscalls.stp
>  
>
Sure I will do that.
Thanks, hien.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: new syscall tapset checked in, but needs work
  2006-02-01 20:31 Martin Hunt
@ 2006-02-01 21:11 ` Frank Ch. Eigler
  2006-02-01 21:45 ` Hien Nguyen
  1 sibling, 0 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2006-02-01 21:11 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap


hunt wrote:

> [...]  I'm using an strace-like output format for the args, although
> that really isn't possible for things like sys_read() where strace
> shows the results that get written in the input buffer. It is
> possible to write a systemtap script to do that, but not at the
> tapset level.

If it is possible at the script level, it's possible at the tapset
level (on purpose).  For instance, a tapset alias for syscall.read
could store away its incoming args in an private array, until its
.return (until bug #1382 is done).  All that code would be optimized
away if not used.

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* new syscall tapset checked in, but needs work
@ 2006-02-01 20:31 Martin Hunt
  2006-02-01 21:11 ` Frank Ch. Eigler
  2006-02-01 21:45 ` Hien Nguyen
  0 siblings, 2 replies; 7+ messages in thread
From: Martin Hunt @ 2006-02-01 20:31 UTC (permalink / raw)
  To: Hien Nguyen; +Cc: systemtap

I've checked in 4 files in stp/src/tapset

I'm working on these
syscalls.stp          (sys_[a-m]*)
x86_64/syscalls.stp

Hien, can you take these?
syscalls2.stp         (sys_[n-z]*)
i686/syscalls.stp

Here's a quick script to test with:

probe syscall.*, syscall.*.return {
        if (pid() == target()) {
                if (returnp)
                        printf("%d\n", returnval())
                else
                        printf("%s (%s) = ", name, argstr)
        }
}

> stap -c bash sys.stp
execve () = 0
brk (0x0) = 165474304
access ("/etc/ld.so.preload", F_OK|R_OK) = -2
open ("/etc/ld.so.cache", O_RDONLY) = 4
open ("/lib/libtermcap.so.2", O_RDONLY) = 4
read (4, ..., 512) = 512
open ("/lib/libdl.so.2", O_RDONLY) = 4
read (4, ..., 512) = 512
open ("/lib/libc.so.6", O_RDONLY) = 4
read (4, ..., 512) = 512
open ("/dev/tty", O_RDWR|O_NONBLOCK) = 4
brk (0x0) = 165474304
brk (0x9df0000) = 165609472
open ("/etc/mtab", O_RDONLY) = 4
read (4, ..., 4096) = 372
open ("/proc/meminfo", O_RDONLY) = 4
read (4, ..., 1024) = 670
access ("/bin/sh", X_OK) = 0
access ("/bin/sh", X_OK) = 0
access ("/bin/bash", X_OK) = 0
access ("/bin/bash", X_OK) = 0
execve () = 0
brk (0x0) = 157106176
access ("/etc/ld.so.preload", F_OK|R_OK) = -2
open ("/etc/ld.so.cache", O_RDONLY) = 4
open ("/lib/libtermcap.so.2", O_RDONLY) = 4
read (4, ..., 512) = 512
open ("/lib/libdl.so.2", O_RDONLY) = 4
read (4, ..., 512) = 512
open ("/lib/libc.so.6", O_RDONLY) = 4
[...]

The files are mostly the old syscall tapset, divided up. I am slowly
converting to the new format. I've seen quite a few typos and a bunch of
commented out code that does userspace copies along with warnings. Those
are bogus; we should not sleep while copying from userspace.

I'm using an strace-like output format for the args, although that
really isn't possible for things like sys_read() where strace shows the
results that get written in the input buffer. It is possible to write a
systemtap script to do that, but not at the tapset level.

Martin


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-02-02  2:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-01 22:47 new syscall tapset checked in, but needs work Stone, Joshua I
2006-02-01 22:59 ` Frank Ch. Eigler
2006-02-02  0:15   ` Martin Hunt
2006-02-02  2:30     ` Frank Ch. Eigler
  -- strict thread matches above, loose matches on Subject: below --
2006-02-01 20:31 Martin Hunt
2006-02-01 21:11 ` Frank Ch. Eigler
2006-02-01 21:45 ` Hien Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).