public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Patch] Fix bug of deadlock for staprun
@ 2007-10-15  5:45 Lai Jiangshan
  2007-10-15 17:54 ` Martin Hunt
  0 siblings, 1 reply; 2+ messages in thread
From: Lai Jiangshan @ 2007-10-15  5:45 UTC (permalink / raw)
  To: systemtap

hi, all

	This patch fixed two serious bugs. These two bugs
will cause staprun deadlock:
	staprun holds some files belong to stap_XXXXXX.ko
and staprun deletes this module before closing these files.
Thus syscall delete_module will wait staprun to close these
files, but staprun also must wait for the return of syscall
delete_module before doing other things(close files). It
will cause deadlock, and root can't cleanup this deadlock,
because staprun is in "Uninterruptible sleep".

	First bug: handle_symbols() opened control_channel
and exited without closing this fd when errors occur. So if
any error occurred in handle_symbols(), it will result in
deadlock.
	The most easy way to reproduce this bug:
$ ulimit -HSn 4
	# make staprun open control_channel successfully,
	# but fail to open one more.
$ staprun -L stap_XXXXXX.ko
	# deadlock
	Just add a line "close_ctl_channel();" in cleanup()
can fix this bug.

	Second bug: staprun runs as privileged process, but
it trusts an unprivileged process stapio's return value. And
the owner of files belong to stap_XXXXXX.ko is the user who
runs staprun.

	So evil_usr(he is just a stapuser) can cause a
deadlock in this way:
step command
1    $ staprun -L stap_XXXXXX.ko
2    $ exec 3<>/sys/kernel/debug/systemtap/stap_XXXXXX/cmd
	# open one of the files belong to stap_XXXXXX.ko,
	# fd 3 will be inherited by staprun, so staprun
	# holds this file.
3    $ staprun -A stap_XXXXXX.ko &
4    # (very quickly) evil_usr uses syscall ptrace to modify
     # argument of syscall exit in stapio, make stapio exit
     # with exit_code=3.

	Thus staprun will get stapio's exit_code=3, so it will
delete stap_XXXXXX.ko which lead to a deadlock.

	In this condition, staprun gets too little information
to prevent evil_usr from doing such things. And staprun has no
enough information to return back all resource belong to
stap_XXXXXX.ko before deleting it. The best way for staprun is
that staprun cannot "Uninterruptible sleep", just left root
cleanup the mess.

	So I just added a flag O_NONBLOCK when calling
delete_module. I think this flag should be added even if the
bug does not exist.

	The bugs can be fixed by the following patch.(patch
for systemtap-snapshot-20071006, not this week)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>

diff -Nur src/runtime/staprun/staprun.c src.new/runtime/staprun/staprun.c
--- src/runtime/staprun/staprun.c	2007-08-16 02:37:21.000000000 +0900
+++ src.new/runtime/staprun/staprun.c	2007-10-15 14:07:27.000000000 +0900
@@ -123,6 +123,8 @@
 
 	if (setpriority (PRIO_PROCESS, 0, 0) < 0)
 		_perr("setpriority");
+
+	close_ctl_channel();
 	
 	/* rc == 2 means disconnected */
 	if (rc == 2)
@@ -133,7 +135,7 @@
 	if (inserted_module || rc == 3) {
 		long ret;
 		dbug(2, "removing module %s\n", modname);
-		ret = do_cap(CAP_SYS_MODULE, delete_module, modname, 0);
+		ret = do_cap(CAP_SYS_MODULE, delete_module, modname, O_NONBLOCK);
 		if (ret != 0)
 			err("Error removing module '%s': %s\n", modname, moderror(errno));
 	}



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Patch] Fix bug of deadlock for staprun
  2007-10-15  5:45 [Patch] Fix bug of deadlock for staprun Lai Jiangshan
@ 2007-10-15 17:54 ` Martin Hunt
  0 siblings, 0 replies; 2+ messages in thread
From: Martin Hunt @ 2007-10-15 17:54 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: systemtap

On Mon, 2007-10-15 at 14:47 +0900, Lai Jiangshan wrote:

> 	First bug: handle_symbols() opened control_channel
> and exited without closing this fd when errors occur. So if
> any error occurred in handle_symbols(), it will result in
> deadlock.
> 	The most easy way to reproduce this bug:
> $ ulimit -HSn 4
> 	# make staprun open control_channel successfully,
> 	# but fail to open one more.
> $ staprun -L stap_XXXXXX.ko
> 	# deadlock
> 	Just add a line "close_ctl_channel();" in cleanup()
> can fix this bug.

I never figured out how to create this, but I did notice the possibility
that it could happen and fixed it last week as part of a rewrite of the
control channel code.

> 
> 	Second bug: staprun runs as privileged process, but
> it trusts an unprivileged process stapio's return value. And
> the owner of files belong to stap_XXXXXX.ko is the user who
> runs staprun.
> 
> 	So evil_usr(he is just a stapuser) can cause a
> deadlock in this way:
> step command
> 1    $ staprun -L stap_XXXXXX.ko
> 2    $ exec 3<>/sys/kernel/debug/systemtap/stap_XXXXXX/cmd
> 	# open one of the files belong to stap_XXXXXX.ko,
> 	# fd 3 will be inherited by staprun, so staprun
> 	# holds this file.
> 3    $ staprun -A stap_XXXXXX.ko &
> 4    # (very quickly) evil_usr uses syscall ptrace to modify
>      # argument of syscall exit in stapio, make stapio exit
>      # with exit_code=3.
> 
> 	Thus staprun will get stapio's exit_code=3, so it will
> delete stap_XXXXXX.ko which lead to a deadlock.
> 
> 	In this condition, staprun gets too little information
> to prevent evil_usr from doing such things. And staprun has no
> enough information to return back all resource belong to
> stap_XXXXXX.ko before deleting it. The best way for staprun is
> that staprun cannot "Uninterruptible sleep", just left root
> cleanup the mess.
> 
> 	So I just added a flag O_NONBLOCK when calling
> delete_module. I think this flag should be added even if the
> bug does not exist.

I guess I don't see the problem here.  The user, with a bit of
creativity, can cause staprun to not exit until some resource he
has managed to grab is freed.  I did this intentionally because I wanted
it to be obvious when this was happening.  What harm does this cause and
why would it be better to have modules silently hanging around pending
deletion? Wouldn't this cause more confusion when the user tries to run
the same module later and it won't load because it still exists?

Martin


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-10-15 17:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-15  5:45 [Patch] Fix bug of deadlock for staprun Lai Jiangshan
2007-10-15 17:54 ` Martin Hunt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).