[PATCH] stap/staprun do not terminate properly

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] stap/staprun do not terminate properly
@ 2014-03-06 21:30 Torsten Polle
  2014-03-07 21:09 ` David Smith
  0 siblings, 1 reply; 5+ messages in thread
From: Torsten Polle @ 2014-03-06 21:30 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 212 bytes --]

Hi,

I'm using the uprobes-inode with task_finder2.c and had two problems,
when I wanted to terminate my probe runs.

I tested the patches with uprobes-inode and the utrace based version.

Kind Regards,
Torsten


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-Crash-when-canceling-task-work.patch --]
[-- Type: text/x-patch, Size: 1383 bytes --]

From ba7faec0af3b06f3a2660e715dbcf039bce710c8 Mon Sep 17 00:00:00 2001
Message-Id: <ba7faec0af3b06f3a2660e715dbcf039bce710c8.1394140974.git.Torsten.Polle@gmx.de>
From: Torsten Polle <Torsten.Polle@gmx.de>
Date: Thu, 6 Mar 2014 21:22:40 +0100
Subject: [PATCH 1/2] Fix: Crash when canceling task work.

As the elements of the list __stp_tf_task_work_list are removed from
the list, a safe iteration has to be used in __stp_tf_cancel_task_work().

Signed-off-by: Torsten Polle <Torsten.Polle@gmx.de>
---
 runtime/linux/task_finder2.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/runtime/linux/task_finder2.c b/runtime/linux/task_finder2.c
index 16fda87..e8f33a3 100644
--- a/runtime/linux/task_finder2.c
+++ b/runtime/linux/task_finder2.c
@@ -169,11 +169,12 @@ static void __stp_tf_free_task_work(struct task_work *work)
 static void __stp_tf_cancel_task_work(void)
 {
 	struct __stp_tf_task_work *node;
+	struct __stp_tf_task_work *tmp;
 	unsigned long flags;
 
 	// Cancel all remaining requests.
 	spin_lock_irqsave(&__stp_tf_task_work_list_lock, flags);
-	list_for_each_entry(node, &__stp_tf_task_work_list, list) {
+	list_for_each_entry_safe(node, tmp, &__stp_tf_task_work_list, list) {
 	    // Remove the item from the list, cancel it, then free it.
 	    list_del(&node->list);
 	    stp_task_work_cancel(node->task, node->work.func);
-- 
1.7.4.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Fix-stap-staprun-deadlocks-when-probing-ends.patch --]
[-- Type: text/x-patch, Size: 1617 bytes --]

From b3dc9c88682a6c02a4cdf8ba73e692e77d593ebb Mon Sep 17 00:00:00 2001
Message-Id: <b3dc9c88682a6c02a4cdf8ba73e692e77d593ebb.1394140974.git.Torsten.Polle@gmx.de>
In-Reply-To: <ba7faec0af3b06f3a2660e715dbcf039bce710c8.1394140974.git.Torsten.Polle@gmx.de>
References: <ba7faec0af3b06f3a2660e715dbcf039bce710c8.1394140974.git.Torsten.Polle@gmx.de>
From: Torsten Polle <Torsten.Polle@gmx.de>
Date: Thu, 6 Mar 2014 21:34:27 +0100
Subject: [PATCH 2/2] Fix: stap/staprun deadlocks when probing ends.

stap_stop_task_finder() exits utrace through utrace_exit(). At that
time, there might be outstanding task workers. Hence, waiting for
exiting the task work waits forever. Therefore exiting the task work
is done after canceling all task workers.

Signed-off-by: Torsten Polle <Torsten.Polle@gmx.de>
---
 runtime/linux/task_finder2.c |    2 ++
 runtime/stp_utrace.c         |    1 -
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/runtime/linux/task_finder2.c b/runtime/linux/task_finder2.c
index e8f33a3..119d04e 100644
--- a/runtime/linux/task_finder2.c
+++ b/runtime/linux/task_finder2.c
@@ -181,6 +181,8 @@ static void __stp_tf_cancel_task_work(void)
 	    _stp_kfree(node);
 	}
 	spin_unlock_irqrestore(&__stp_tf_task_work_list_lock, flags);
+
+	stp_task_work_exit();
 }
 
 static u32
diff --git a/runtime/stp_utrace.c b/runtime/stp_utrace.c
index 89ea0e4..8c948b6 100644
--- a/runtime/stp_utrace.c
+++ b/runtime/stp_utrace.c
@@ -291,7 +291,6 @@ static int utrace_exit(void)
 	if (utrace_engine_cachep)
 		kmem_cache_destroy(utrace_engine_cachep);
 
-	stp_task_work_exit();
 	return 0;
 }
 
-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] stap/staprun do not terminate properly
  2014-03-06 21:30 [PATCH] stap/staprun do not terminate properly Torsten Polle
@ 2014-03-07 21:09 ` David Smith
  2014-03-07 22:11   ` Torsten Polle
  2014-03-10 22:36   ` Josh Stone
  0 siblings, 2 replies; 5+ messages in thread
From: David Smith @ 2014-03-07 21:09 UTC (permalink / raw)
  To: Torsten Polle, systemtap

On 03/06/2014 03:30 PM, Torsten Polle wrote:
> Hi,
> 
> I'm using the uprobes-inode with task_finder2.c and had two problems,
> when I wanted to terminate my probe runs.
> 
> I tested the patches with uprobes-inode and the utrace based version.
> 
> Kind Regards,
> Torsten

Torsten,

Thanks *so* much for the patches. I've seen a hang in stap around this
area, but I could never reproduce it.

I checked the 1st patch in as commit e695d46 and the 2nd patch (tweaked)
in as commit 9ee1bfe.

I tweaked the 2nd patch just a bit. Originally the flow went like:

====
stap_stap_task_finder()
{
  // ...

  // Note that utrace_exit() calls stp_task_work_exit()
  utrace_exit();

  __stp_tf_cancel_task_work();
}
====

Your patch changed it to this:

====
stap_stap_task_finder()
{
  // ...

  utrace_exit();

  // Note that __stp_tf_cancel_task_work() calls
  // stp_task_work_exit()
  __stp_tf_cancel_task_work();
}
====

I saw what you were doing, but that didn't "feel" quite right.
utrace_init() calls stp_task_work_init(), so it made sense for
utrace_exit() to call stp_task_work_exit().

So, instead I did this:

====
stap_stap_task_finder()
{
  // ...

  __stp_tf_cancel_task_work();

  // Note that utrace_exit() calls stp_task_work_exit()
  utrace_exit();
}
====

This moves canceling all outstanding task_work items before shutting
down utrace (and calling stp_task_work_exit()). I think the end result
is the same as your patch, and I think this makes a little more sense.
This way we've canceled all the task_work items before shutting down
utrace (and freeing all the memory allocated for utrace).

If this doesn't work for you or you see a hole in this logic please let
me know.

BTW, if you have a good idea for a reproducer for the original problem
I'd like to see it. Perhaps I could add a test case for it.

Thanks again for the patches!

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] stap/staprun do not terminate properly
  2014-03-07 21:09 ` David Smith
@ 2014-03-07 22:11   ` Torsten Polle
  2014-03-10 22:36   ` Josh Stone
  1 sibling, 0 replies; 5+ messages in thread
From: Torsten Polle @ 2014-03-07 22:11 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

David Smith writes:
 > On 03/06/2014 03:30 PM, Torsten Polle wrote:
 >> Hi,
 >> 
 >> I'm using the uprobes-inode with task_finder2.c and had two problems,
 >> when I wanted to terminate my probe runs.
 >> 
 >> I tested the patches with uprobes-inode and the utrace based version.
 >> 
 >> Kind Regards,
 >> Torsten

 > Torsten,

 > Thanks *so* much for the patches. I've seen a hang in stap around
 > this area, but I could never reproduce it.

David,

I could easily reproduce the problem for half a year now 100%, but I
never got the time to find the root cause.

 > I checked the 1st patch in as commit e695d46 and the 2nd patch
 > (tweaked) in as commit 9ee1bfe.

 > I tweaked the 2nd patch just a bit. Originally the flow went like:

 > ====
 > stap_stap_task_finder()
 > {
 >   // ...

 >   // Note that utrace_exit() calls stp_task_work_exit()
 >   utrace_exit();

 >   __stp_tf_cancel_task_work();
 > }
 > ====

 > Your patch changed it to this:

 > ====
 > stap_stap_task_finder()
 > {
 >   // ...

 >   utrace_exit();

 >   // Note that __stp_tf_cancel_task_work() calls
 >   // stp_task_work_exit()
 >   __stp_tf_cancel_task_work();
 > }
 > ====

 > I saw what you were doing, but that didn't "feel" quite right.
 > utrace_init() calls stp_task_work_init(), so it made sense for
 > utrace_exit() to call stp_task_work_exit().

 > So, instead I did this:

 > ====
 > stap_stap_task_finder()
 > {
 >   // ...

 >   __stp_tf_cancel_task_work();

 >   // Note that utrace_exit() calls stp_task_work_exit()
 >   utrace_exit();
 > }
 > ====

 > This moves canceling all outstanding task_work items before shutting
 > down utrace (and calling stp_task_work_exit()). I think the end
 > result is the same as your patch, and I think this makes a little
 > more sense.  This way we've canceled all the task_work items before
 > shutting down utrace (and freeing all the memory allocated for
 > utrace).

 > If this doesn't work for you or you see a hole in this logic please
 > let me know.

I can't beat your logic. It should work for me. Unfortunately, I don't
have direct access to my target for two weeks.

 > BTW, if you have a good idea for a reproducer for the original
 > problem I'd like to see it. Perhaps I could add a test case for it.

I simply define a process probe and cross compile the module "foo" for
an ARM target. Then I run "staprun -o /tmp/probes.txt foo". After a
while I (try to) terminate the execution by "Ctrl-C".

If there is a process that is never scheduled, the task worker for the
process is never executed. Thus, staprun hangs. Usually, there are a few
processes that exhibit this behaviour on my target.

 > Thanks again for the patches!

 > -- 
 > David Smith
 > dsmith@redhat.com
 > Red Hat
 > http://www.redhat.com
 > 256.217.0141 (direct)
 > 256.837.0057 (fax)

Torsten

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] stap/staprun do not terminate properly
  2014-03-07 21:09 ` David Smith
  2014-03-07 22:11   ` Torsten Polle
@ 2014-03-10 22:36   ` Josh Stone
  2014-03-12 14:11     ` David Smith
  1 sibling, 1 reply; 5+ messages in thread
From: Josh Stone @ 2014-03-10 22:36 UTC (permalink / raw)
  To: David Smith, Torsten Polle, systemtap

On 03/07/2014 01:09 PM, David Smith wrote:
> So, instead I did this:
> 
> ====
> stap_stap_task_finder()
> {
>   // ...
> 
>   __stp_tf_cancel_task_work();
> 
>   // Note that utrace_exit() calls stp_task_work_exit()
>   utrace_exit();
> }
> ====
> 
> This moves canceling all outstanding task_work items before shutting
> down utrace (and calling stp_task_work_exit()). I think the end result
> is the same as your patch, and I think this makes a little more sense.
> This way we've canceled all the task_work items before shutting down
> utrace (and freeing all the memory allocated for utrace).
> 
> If this doesn't work for you or you see a hole in this logic please let
> me know.

I notice that utrace_exit() calls utrace_shutdown(), but so does
__stp_task_finder_cleanup() earlier in stap_stop_task_finder().  So in
fact the shutdown is still happening before canceling this work, and the
only thing left for utrace_exit() is the kmem_cache_destroy and
stp_task_work_exit().  Do you think this could be a problem?

In particular, I worry about this comment:

  /* After calling tracepoint_synchronize_unregister(), we're
   * sure there are no outstanding tracepoint probes being
   * called.  So, now would be a great time to free everything. */

There won't be any outstanding tracepoint handlers, but couldn't there
still be outstanding task_work scheduled from those handlers?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] stap/staprun do not terminate properly
  2014-03-10 22:36   ` Josh Stone
@ 2014-03-12 14:11     ` David Smith
  0 siblings, 0 replies; 5+ messages in thread
From: David Smith @ 2014-03-12 14:11 UTC (permalink / raw)
  To: Josh Stone, Torsten Polle, systemtap, Jonathan Lebon

On 03/10/2014 05:36 PM, Josh Stone wrote:
> On 03/07/2014 01:09 PM, David Smith wrote:
>> So, instead I did this:
>>
>> ====
>> stap_stap_task_finder()
>> {
>>   // ...
>>
>>   __stp_tf_cancel_task_work();
>>
>>   // Note that utrace_exit() calls stp_task_work_exit()
>>   utrace_exit();
>> }
>> ====
>>
>> This moves canceling all outstanding task_work items before shutting
>> down utrace (and calling stp_task_work_exit()). I think the end result
>> is the same as your patch, and I think this makes a little more sense.
>> This way we've canceled all the task_work items before shutting down
>> utrace (and freeing all the memory allocated for utrace).
>>
>> If this doesn't work for you or you see a hole in this logic please let
>> me know.
> 
> I notice that utrace_exit() calls utrace_shutdown(), but so does
> __stp_task_finder_cleanup() earlier in stap_stop_task_finder().  So in
> fact the shutdown is still happening before canceling this work, and the
> only thing left for utrace_exit() is the kmem_cache_destroy and
> stp_task_work_exit().  Do you think this could be a problem?
> 
> In particular, I worry about this comment:
> 
>   /* After calling tracepoint_synchronize_unregister(), we're
>    * sure there are no outstanding tracepoint probes being
>    * called.  So, now would be a great time to free everything. */
> 
> There won't be any outstanding tracepoint handlers, but couldn't there
> still be outstanding task_work scheduled from those handlers?
> 

I went over this code yesterday, and just committed a patch (commit
6628352) that tightens things up here. This patch fixes a crash that
Jonathan was seeing. The patch make sure that all the tracepoint probes
and task work items are finished running before the kmem caches are freed.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-12 14:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-06 21:30 [PATCH] stap/staprun do not terminate properly Torsten Polle
2014-03-07 21:09 ` David Smith
2014-03-07 22:11   ` Torsten Polle
2014-03-10 22:36   ` Josh Stone
2014-03-12 14:11     ` David Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).