public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Using systemtap for rewriting syscalls?
@ 2016-04-05 21:20 Riccardo Murri
  2016-04-05 23:55 ` Josh Stone
  0 siblings, 1 reply; 3+ messages in thread
From: Riccardo Murri @ 2016-04-05 21:20 UTC (permalink / raw)
  To: systemtap

Hello,

I'm completely new to systemtap, so please pardon me if this question
is trivial or is already answered in the docs -- there's so much to read!

On one of the systems I manage, a directory has become unreadable but
I cannot take the system down for repairs for some time.  However,
directories below it are fine, and the unreadable directory is only an
issue with programs that recursively perform `lstat()` on every path
component.

Would it be possible to use systemtap to live-patch the system to
return a fixed value for `lstat()` if the path argument is the path to
the unreadable dir?  In pseudo-code:

    probe kernel.syscall.lstat {
      if (argument_path == "/path/to/bad/dir") {
        /* return fake statbuf */
        memcpy({.st_dev=..., .st_ino=..., ...}, argument_buf);
      } else {
        /* forward call to kernel */
        do_real_lstat(argument_buf, argument_path);
    }

If that makes sense, where can I start looking for examples to adapt
and/or relevant documentation?

Thank you very much for any hint!

Riccardo


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Using systemtap for rewriting syscalls?
  2016-04-05 21:20 Using systemtap for rewriting syscalls? Riccardo Murri
@ 2016-04-05 23:55 ` Josh Stone
  2016-04-07 19:46   ` Riccardo Murri
  0 siblings, 1 reply; 3+ messages in thread
From: Josh Stone @ 2016-04-05 23:55 UTC (permalink / raw)
  To: Riccardo Murri, systemtap

On 04/05/2016 02:14 PM, Riccardo Murri wrote:
> Hello,
> 
> I'm completely new to systemtap, so please pardon me if this question
> is trivial or is already answered in the docs -- there's so much to read!
> 
> On one of the systems I manage, a directory has become unreadable but
> I cannot take the system down for repairs for some time.  However,
> directories below it are fine, and the unreadable directory is only an
> issue with programs that recursively perform `lstat()` on every path
> component.
> 
> Would it be possible to use systemtap to live-patch the system to
> return a fixed value for `lstat()` if the path argument is the path to
> the unreadable dir?  In pseudo-code:
> 
>     probe kernel.syscall.lstat {
>       if (argument_path == "/path/to/bad/dir") {
>         /* return fake statbuf */
>         memcpy({.st_dev=..., .st_ino=..., ...}, argument_buf);
>       } else {
>         /* forward call to kernel */
>         do_real_lstat(argument_buf, argument_path);
>     }
> 
> If that makes sense, where can I start looking for examples to adapt
> and/or relevant documentation?
> 
> Thank you very much for any hint!

In general, stap can't completely replace a function.  However, with
guru mode "-g" you can often change parameters to cause a function to
take a shorter error path.  This requires looking at the function in
question to know how it works, of course.

In this case, syscalls get a little hairy in how arguments are
represented thanks to SYSCALL_DEFINE wrappers, which puts a layer of
inlined argument casting from long register values.  Plus the output of
lstat has to poke user memory, which is possible but not so easy.

So I'd suggest probing this one a little bit down the call chain, in
vfs_fstatat.  On entry, check your conditions and fill in the mock
values, then trigger a quick error.  Then catch the return to correct
the error back to a mock success.  Something like this:

global mocked;
probe kernel.function("vfs_fstatat").call {
  if (user_string($filename) == "/path/to/bad/dir") {
    mocked[tid()] = 1;  // remember for the .return
    $stat->dev = 123;
    $stat->ino = 456;
    // ...
    $flag = -1;  // bad flag bits will trigger EINVAL
  }
}
probe kernel.function("vfs_fstatat").return {
  if (tid() in mocked) {
    delete mocked[tid()];
    $return = 0; // mock success!
  }
}


PS- we're also assuming here that stats on a fully-specified filename
"/path/to/bad/dir" are the only thing you need to squash.  There are
lots of indirect ways a given path might be reached, so I hope this is
really enough for you...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Using systemtap for rewriting syscalls?
  2016-04-05 23:55 ` Josh Stone
@ 2016-04-07 19:46   ` Riccardo Murri
  0 siblings, 0 replies; 3+ messages in thread
From: Riccardo Murri @ 2016-04-07 19:46 UTC (permalink / raw)
  To: Josh Stone; +Cc: systemtap

Hi Josh,

> global mocked;
> probe kernel.function("vfs_fstatat").call {
>   if (user_string($filename) == "/path/to/bad/dir") {
>     mocked[tid()] = 1;  // remember for the .return
>     $stat->dev = 123;
>     $stat->ino = 456;
>     // ...
>     $flag = -1;  // bad flag bits will trigger EINVAL
>   }
> }
> probe kernel.function("vfs_fstatat").return {
>   if (tid() in mocked) {
>     delete mocked[tid()];
>     $return = 0; // mock success!
>   }
> }
>

Thanks, this is exactly what I was looking for!

Unfortunately, I was not able to try it out due to missing "debuginfo"
package in the repos, so in the end I had to reboot anyway. But I'll
keep it in the toolbox in case the same issue happens again.

Thank you very much!

Riccardo

--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/#Riccardo.Murri

S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)

Tel: +41 44 635 4208
Fax: +41 44 635 6888

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-07 19:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-05 21:20 Using systemtap for rewriting syscalls? Riccardo Murri
2016-04-05 23:55 ` Josh Stone
2016-04-07 19:46   ` Riccardo Murri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).