public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* sleeping, locks and debug kernels
@ 2011-12-12 16:15 Mark Wielaard
  2011-12-12 17:04 ` David Smith
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2011-12-12 16:15 UTC (permalink / raw)
  To: systemtap

Hi,

While adding the support for "dynamic" vma registration it really helped
to have a debug kernel around that had the various LOCK debugging
configs set. Especially CONFIG_LOCKDEP is pretty handy. Some distros
have a separate debug kernel around that you can (should) install when
hacking on systemtap to catch issues early (fedora has yum install
kernel-debug, after booting into it just run stap-prep to get all
necessary -devel and -debuginfo packages).

This will slow down your system a bit, but will show when code possibly
tries to sleep while in interrupt context (where sleeping is really
fatal), some accesses to resources that don't have the required locks,
threads taking locks in different order that could possible deadlock and
when code holds a lock while calling a sleeping function.

Sadly our current code is somewhat noisy because it has some of this
issues. I resolved the ones in uprobes where we weren't holding the
rcu_read_lock where necessary. And the one in the transport layer, where
we were holding a mutex over a lot of code that could potentially sleep.
Reviews of commit 262f75 and commit ab8633 appreciated since any changes
in the locking code is always a little scary.

There is one issue I don't know how to solve. That is
stap_start_task_finder() this takes a rcu_read_lock() goes over every
task, inspects each, calls utrace_attach on it if appropriate, gets the
task->mm, adds the engines to some internal datastructures, checks that
unprivileged users don't get access to utrace engines of task that
aren't theirs and then after doing that for each task releases the lock.
The problem is that utrace_attach_task() may sleep, since it must
allocate memory to create a new enginer. Which is not nice while we have
the rcu_read_lock. But I don't immediately see how to split up this loop
so that we only hold the lock while doing non-sleepy things.

Maybe task_finder2 will resolve all these issues, but we probably will
be supporting utrace and task_finder[1] for some time. Besides being a
little bad to sleep while holding that lock, and making dmesg very
noisy, it also prevents lockdep from being more useful. lockdep will
show the which locks were taken when it encounters something bad, but
only for the first such issue. Which now is almost always the
stap_start_task_finder() call. So if we could solve this one, then
running under a debug kernel would be much more useful.

Anybody got a good idea how to refactor stap_start_task_finder() to make
lockdep happy?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 16:15 sleeping, locks and debug kernels Mark Wielaard
@ 2011-12-12 17:04 ` David Smith
  2011-12-12 17:34   ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: David Smith @ 2011-12-12 17:04 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap, Oleg Nesterov

On 12/12/2011 09:21 AM, Mark Wielaard wrote:

> Hi,
> 

... sleeping issue described ...

> Reviews of commit 262f75 and commit ab8633 appreciated since any changes
> in the locking code is always a little scary.

I'll look at these later.

> There is one issue I don't know how to solve. That is
> stap_start_task_finder() this takes a rcu_read_lock() goes over every
> task, inspects each, calls utrace_attach on it if appropriate, gets the
> task->mm, adds the engines to some internal datastructures, checks that
> unprivileged users don't get access to utrace engines of task that
> aren't theirs and then after doing that for each task releases the lock.
> The problem is that utrace_attach_task() may sleep, since it must
> allocate memory to create a new enginer. Which is not nice while we have
> the rcu_read_lock. But I don't immediately see how to split up this loop
> so that we only hold the lock while doing non-sleepy things.


The task_finder2 and task_finder code is identical in this area - but
running with the task_finder2 won't cause the sleeping errors.

The task_finder2 uses our internal mini-utrace, which also allocates
memory.  But it uses GFP_IOFS instead of GFP_KERNEL when allocating
memory.  GFP_IOFS doesn't include __GFP_WAIT.  Using GFP_IOFS means
we'll never wait, but also means the memory allocation could fail.  I
was OK with that tradeoff.

I don't know how we could split up that loop.  Perhaps Oleg might have
some thoughts or might be persuaded to change the memory allocation
flags in utrace itself.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 17:04 ` David Smith
@ 2011-12-12 17:34   ` Oleg Nesterov
  2011-12-12 18:00     ` Mark Wielaard
  2012-01-31 22:23     ` Mark Wielaard
  0 siblings, 2 replies; 9+ messages in thread
From: Oleg Nesterov @ 2011-12-12 17:34 UTC (permalink / raw)
  To: David Smith; +Cc: Mark Wielaard, systemtap

On 12/12, David Smith wrote:
>
> On 12/12/2011 09:21 AM, Mark Wielaard wrote:
>
> > There is one issue I don't know how to solve. That is
> > stap_start_task_finder() this takes a rcu_read_lock() goes over every
> > task, inspects each, calls utrace_attach on it if appropriate, gets the
> > task->mm, adds the engines to some internal datastructures, checks that
> > unprivileged users don't get access to utrace engines of task that
> > aren't theirs and then after doing that for each task releases the lock.
> > The problem is that utrace_attach_task() may sleep, since it must
> > allocate memory to create a new enginer. Which is not nice while we have
> > the rcu_read_lock. But I don't immediately see how to split up this loop
> > so that we only hold the lock while doing non-sleepy things.

Yes, this is the problem. Although I am not sure how "goes over every
task" can work under rcu_read_lock() without races with clone(). For
example, even "attach all threads in this thread group" is not simple.
But this is off-topic.

> I don't know how we could split up that loop.  Perhaps Oleg might have
> some thoughts or might be persuaded to change the memory allocation
> flags in utrace itself.

Yes, it is very simple to add UTRACE_ATTACH_CREATE_ATOMIC.

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 17:34   ` Oleg Nesterov
@ 2011-12-12 18:00     ` Mark Wielaard
  2011-12-12 18:34       ` Oleg Nesterov
  2012-01-31 22:23     ` Mark Wielaard
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2011-12-12 18:00 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: David Smith, systemtap

On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
> On 12/12, David Smith wrote:
> >
> > On 12/12/2011 09:21 AM, Mark Wielaard wrote:
> >
> > > There is one issue I don't know how to solve. That is
> > > stap_start_task_finder() this takes a rcu_read_lock() goes over every
> > > task, inspects each, calls utrace_attach on it if appropriate, gets the
> > > task->mm, adds the engines to some internal datastructures, checks that
> > > unprivileged users don't get access to utrace engines of task that
> > > aren't theirs and then after doing that for each task releases the lock.
> > > The problem is that utrace_attach_task() may sleep, since it must
> > > allocate memory to create a new enginer. Which is not nice while we have
> > > the rcu_read_lock. But I don't immediately see how to split up this loop
> > > so that we only hold the lock while doing non-sleepy things.
> 
> Yes, this is the problem. Although I am not sure how "goes over every
> task" can work under rcu_read_lock() without races with clone(). For
> example, even "attach all threads in this thread group" is not simple.
> But this is off-topic.

Maybe off-topic, but it would be useful to know what the issues with
going over all threads while holding (or even while not holding)
rcu_read_lock. Do you have any pointers and/or recommendations?

> > I don't know how we could split up that loop.  Perhaps Oleg might have
> > some thoughts or might be persuaded to change the memory allocation
> > flags in utrace itself.
> 
> Yes, it is very simple to add UTRACE_ATTACH_CREATE_ATOMIC.

That would be useful I think.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 18:00     ` Mark Wielaard
@ 2011-12-12 18:34       ` Oleg Nesterov
  2011-12-12 19:08         ` David Smith
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2011-12-12 18:34 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: David Smith, systemtap

On 12/12, Mark Wielaard wrote:
>
> On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
> >
> > Yes, this is the problem. Although I am not sure how "goes over every
> > task" can work under rcu_read_lock() without races with clone(). For
> > example, even "attach all threads in this thread group" is not simple.
> > But this is off-topic.
>
> Maybe off-topic, but it would be useful to know what the issues with
> going over all threads while holding (or even while not holding)
> rcu_read_lock.

You can miss a freshly cloned thread. UTRACE_ATTACH_CREATE_ATOMIC makes
this easier, you can use tasklist_lock. But even in this case you probably
need to hook ->report_clone().

> > > I don't know how we could split up that loop.  Perhaps Oleg might have
> > > some thoughts or might be persuaded to change the memory allocation
> > > flags in utrace itself.
> >
> > Yes, it is very simple to add UTRACE_ATTACH_CREATE_ATOMIC.
>
> That would be useful I think.

OK, I'll try to recheck and make the patch.

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 18:34       ` Oleg Nesterov
@ 2011-12-12 19:08         ` David Smith
  2011-12-12 22:35           ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: David Smith @ 2011-12-12 19:08 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Mark Wielaard, systemtap

On 12/12/2011 11:54 AM, Oleg Nesterov wrote:

> On 12/12, Mark Wielaard wrote:
>>
>> On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
>>>
>>> Yes, this is the problem. Although I am not sure how "goes over every
>>> task" can work under rcu_read_lock() without races with clone(). For
>>> example, even "attach all threads in this thread group" is not simple.
>>> But this is off-topic.
>>
>> Maybe off-topic, but it would be useful to know what the issues with
>> going over all threads while holding (or even while not holding)
>> rcu_read_lock.
> 
> You can miss a freshly cloned thread. UTRACE_ATTACH_CREATE_ATOMIC makes
> this easier, you can use tasklist_lock. But even in this case you probably
> need to hook ->report_clone().


We certainly do hook report_clone().

If I remember correctly, the tasklist_lock (or perhaps the functions
that access it) wasn't exported, but I haven't looked in a year so I
could be wrong now.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 19:08         ` David Smith
@ 2011-12-12 22:35           ` Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2011-12-12 22:35 UTC (permalink / raw)
  To: David Smith; +Cc: Mark Wielaard, systemtap

On 12/12, David Smith wrote:
>
> On 12/12/2011 11:54 AM, Oleg Nesterov wrote:
>
> > On 12/12, Mark Wielaard wrote:
> >>
> >> On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
> >>>
> >>> Yes, this is the problem. Although I am not sure how "goes over every
> >>> task" can work under rcu_read_lock() without races with clone(). For
> >>> example, even "attach all threads in this thread group" is not simple.
> >>> But this is off-topic.
> >>
> >> Maybe off-topic, but it would be useful to know what the issues with
> >> going over all threads while holding (or even while not holding)
> >> rcu_read_lock.
> >
> > You can miss a freshly cloned thread. UTRACE_ATTACH_CREATE_ATOMIC makes
> > this easier, you can use tasklist_lock. But even in this case you probably
> > need to hook ->report_clone().
>
>
> We certainly do hook report_clone().
>
> If I remember correctly, the tasklist_lock (or perhaps the functions
> that access it) wasn't exported, but I haven't looked in a year so I
> could be wrong now.

Ah, yes, it is not exported. You can use kallsyms_lookup_name(tasklist)
but this is not nice.

Starting from 2.6.39 we have signal_struct->threadgroup_fork_lock, but
it depends on CONFIG_CGROUPS. And probably this interface will be changed
soon.

As for "attach to the whole thread-group", I think it is possible to
do this without holding rcu_read_lock or tasklist, just this is not
simple. https://www.redhat.com/archives/utrace-devel/2010-November/msg00006.html
tries to do this, see ugdb_attach_all_threads(). Not sure this is
the best approach.

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2011-12-12 17:34   ` Oleg Nesterov
  2011-12-12 18:00     ` Mark Wielaard
@ 2012-01-31 22:23     ` Mark Wielaard
  2012-02-01 13:22       ` Mark Wielaard
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2012-01-31 22:23 UTC (permalink / raw)
  To: Oleg Nesterov, fche; +Cc: David Smith, systemtap

On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
> Yes, it is very simple to add UTRACE_ATTACH_CREATE_ATOMIC.

Since newer utrace now have that flag I made a patch to take
advantage of it. This makes us lockdep free (at least on recent
rawhide kernels for the systemtap.base make installcheck subset).

It is on the mjw/create_atomic branch. Please pull onto master
if we like to have this for the release.

Cheers,

Mark


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sleeping, locks and debug kernels
  2012-01-31 22:23     ` Mark Wielaard
@ 2012-02-01 13:22       ` Mark Wielaard
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Wielaard @ 2012-02-01 13:22 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: fche, David Smith, systemtap

Oleg, could you take a quick look at the second issue below, I suspect
it is a bug in systemtap's utrace usage, but maybe there is also an
utrace issue (the bug report shows some memory corruption).

On Tue, 2012-01-31 at 23:23 +0100, Mark Wielaard wrote:
> On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
> > Yes, it is very simple to add UTRACE_ATTACH_CREATE_ATOMIC.
> 
> Since newer utrace now have that flag I made a patch to take
> advantage of it. This makes us lockdep free (at least on recent
> rawhide kernels for the systemtap.base make installcheck subset).
> 
> It is on the mjw/create_atomic branch.

I found a couple of small locking issues with this that I also fixed.
Now I have merged it over to master. With this (and a very recent
kernel, I have tested against 3.3.0-0.rc1.git6.1.fc17.x86_64) we seem to
have a lockdep warning free installcheck testsuite (hurray!) except for
the following two issues:

pfiles.stp does something nasty
http://sourceware.org/bugzilla/show_bug.cgi?id=13641
This example contains a ton of native guru code. It holds the rcu lock
while calling something that allocates and so can sleep, which is a no,
no. I haven't yet found what it is exactly.

More seriously though is the following:
Bad interaction between itrace and stap_stop_task_finder
http://sourceware.org/bugzilla/show_bug.cgi?id=13639
This doesn't seem like it is a new issue (Frank said he saw it before).
But it is much more observable now with all other lockdep issues gone.
There is some race between the itrace utrace engine/task removal and the
stap_stop_task_finder utrace engine/task removal. stap_stop_task_finder
calls stap_utrace_detach_ops which just goes over all tasks and tries to
remove the utrace engine from them, ignoring any errors. But during this
it looks like utrace_barrier sees an -ESRCH, does a
schedule_timeout_interruptible, but we are holding the rcu_read_lock, so
that is also a no, no... Maybe this can be "fixed" (aka papered over)
inside utrace by changing the error path to not do the interruptible
bit, but I suspect we are having some kind utrace task detaching race
between itrace and task_finder.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-02-01 13:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-12 16:15 sleeping, locks and debug kernels Mark Wielaard
2011-12-12 17:04 ` David Smith
2011-12-12 17:34   ` Oleg Nesterov
2011-12-12 18:00     ` Mark Wielaard
2011-12-12 18:34       ` Oleg Nesterov
2011-12-12 19:08         ` David Smith
2011-12-12 22:35           ` Oleg Nesterov
2012-01-31 22:23     ` Mark Wielaard
2012-02-01 13:22       ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).