* [PATCH] stap/staprun do not terminate properly
@ 2014-03-06 21:30 Torsten Polle
2014-03-07 21:09 ` David Smith
0 siblings, 1 reply; 5+ messages in thread
From: Torsten Polle @ 2014-03-06 21:30 UTC (permalink / raw)
To: systemtap
[-- Attachment #1: Type: text/plain, Size: 212 bytes --]
Hi,
I'm using the uprobes-inode with task_finder2.c and had two problems,
when I wanted to terminate my probe runs.
I tested the patches with uprobes-inode and the utrace based version.
Kind Regards,
Torsten
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-Crash-when-canceling-task-work.patch --]
[-- Type: text/x-patch, Size: 1383 bytes --]
From ba7faec0af3b06f3a2660e715dbcf039bce710c8 Mon Sep 17 00:00:00 2001
Message-Id: <ba7faec0af3b06f3a2660e715dbcf039bce710c8.1394140974.git.Torsten.Polle@gmx.de>
From: Torsten Polle <Torsten.Polle@gmx.de>
Date: Thu, 6 Mar 2014 21:22:40 +0100
Subject: [PATCH 1/2] Fix: Crash when canceling task work.
As the elements of the list __stp_tf_task_work_list are removed from
the list, a safe iteration has to be used in __stp_tf_cancel_task_work().
Signed-off-by: Torsten Polle <Torsten.Polle@gmx.de>
---
runtime/linux/task_finder2.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/runtime/linux/task_finder2.c b/runtime/linux/task_finder2.c
index 16fda87..e8f33a3 100644
--- a/runtime/linux/task_finder2.c
+++ b/runtime/linux/task_finder2.c
@@ -169,11 +169,12 @@ static void __stp_tf_free_task_work(struct task_work *work)
static void __stp_tf_cancel_task_work(void)
{
struct __stp_tf_task_work *node;
+ struct __stp_tf_task_work *tmp;
unsigned long flags;
// Cancel all remaining requests.
spin_lock_irqsave(&__stp_tf_task_work_list_lock, flags);
- list_for_each_entry(node, &__stp_tf_task_work_list, list) {
+ list_for_each_entry_safe(node, tmp, &__stp_tf_task_work_list, list) {
// Remove the item from the list, cancel it, then free it.
list_del(&node->list);
stp_task_work_cancel(node->task, node->work.func);
--
1.7.4.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Fix-stap-staprun-deadlocks-when-probing-ends.patch --]
[-- Type: text/x-patch, Size: 1617 bytes --]
From b3dc9c88682a6c02a4cdf8ba73e692e77d593ebb Mon Sep 17 00:00:00 2001
Message-Id: <b3dc9c88682a6c02a4cdf8ba73e692e77d593ebb.1394140974.git.Torsten.Polle@gmx.de>
In-Reply-To: <ba7faec0af3b06f3a2660e715dbcf039bce710c8.1394140974.git.Torsten.Polle@gmx.de>
References: <ba7faec0af3b06f3a2660e715dbcf039bce710c8.1394140974.git.Torsten.Polle@gmx.de>
From: Torsten Polle <Torsten.Polle@gmx.de>
Date: Thu, 6 Mar 2014 21:34:27 +0100
Subject: [PATCH 2/2] Fix: stap/staprun deadlocks when probing ends.
stap_stop_task_finder() exits utrace through utrace_exit(). At that
time, there might be outstanding task workers. Hence, waiting for
exiting the task work waits forever. Therefore exiting the task work
is done after canceling all task workers.
Signed-off-by: Torsten Polle <Torsten.Polle@gmx.de>
---
runtime/linux/task_finder2.c | 2 ++
runtime/stp_utrace.c | 1 -
2 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/runtime/linux/task_finder2.c b/runtime/linux/task_finder2.c
index e8f33a3..119d04e 100644
--- a/runtime/linux/task_finder2.c
+++ b/runtime/linux/task_finder2.c
@@ -181,6 +181,8 @@ static void __stp_tf_cancel_task_work(void)
_stp_kfree(node);
}
spin_unlock_irqrestore(&__stp_tf_task_work_list_lock, flags);
+
+ stp_task_work_exit();
}
static u32
diff --git a/runtime/stp_utrace.c b/runtime/stp_utrace.c
index 89ea0e4..8c948b6 100644
--- a/runtime/stp_utrace.c
+++ b/runtime/stp_utrace.c
@@ -291,7 +291,6 @@ static int utrace_exit(void)
if (utrace_engine_cachep)
kmem_cache_destroy(utrace_engine_cachep);
- stp_task_work_exit();
return 0;
}
--
1.7.4.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] stap/staprun do not terminate properly
2014-03-06 21:30 [PATCH] stap/staprun do not terminate properly Torsten Polle
@ 2014-03-07 21:09 ` David Smith
2014-03-07 22:11 ` Torsten Polle
2014-03-10 22:36 ` Josh Stone
0 siblings, 2 replies; 5+ messages in thread
From: David Smith @ 2014-03-07 21:09 UTC (permalink / raw)
To: Torsten Polle, systemtap
On 03/06/2014 03:30 PM, Torsten Polle wrote:
> Hi,
>
> I'm using the uprobes-inode with task_finder2.c and had two problems,
> when I wanted to terminate my probe runs.
>
> I tested the patches with uprobes-inode and the utrace based version.
>
> Kind Regards,
> Torsten
Torsten,
Thanks *so* much for the patches. I've seen a hang in stap around this
area, but I could never reproduce it.
I checked the 1st patch in as commit e695d46 and the 2nd patch (tweaked)
in as commit 9ee1bfe.
I tweaked the 2nd patch just a bit. Originally the flow went like:
====
stap_stap_task_finder()
{
// ...
// Note that utrace_exit() calls stp_task_work_exit()
utrace_exit();
__stp_tf_cancel_task_work();
}
====
Your patch changed it to this:
====
stap_stap_task_finder()
{
// ...
utrace_exit();
// Note that __stp_tf_cancel_task_work() calls
// stp_task_work_exit()
__stp_tf_cancel_task_work();
}
====
I saw what you were doing, but that didn't "feel" quite right.
utrace_init() calls stp_task_work_init(), so it made sense for
utrace_exit() to call stp_task_work_exit().
So, instead I did this:
====
stap_stap_task_finder()
{
// ...
__stp_tf_cancel_task_work();
// Note that utrace_exit() calls stp_task_work_exit()
utrace_exit();
}
====
This moves canceling all outstanding task_work items before shutting
down utrace (and calling stp_task_work_exit()). I think the end result
is the same as your patch, and I think this makes a little more sense.
This way we've canceled all the task_work items before shutting down
utrace (and freeing all the memory allocated for utrace).
If this doesn't work for you or you see a hole in this logic please let
me know.
BTW, if you have a good idea for a reproducer for the original problem
I'd like to see it. Perhaps I could add a test case for it.
Thanks again for the patches!
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] stap/staprun do not terminate properly
2014-03-07 21:09 ` David Smith
@ 2014-03-07 22:11 ` Torsten Polle
2014-03-10 22:36 ` Josh Stone
1 sibling, 0 replies; 5+ messages in thread
From: Torsten Polle @ 2014-03-07 22:11 UTC (permalink / raw)
To: David Smith; +Cc: systemtap
David Smith writes:
> On 03/06/2014 03:30 PM, Torsten Polle wrote:
>> Hi,
>>
>> I'm using the uprobes-inode with task_finder2.c and had two problems,
>> when I wanted to terminate my probe runs.
>>
>> I tested the patches with uprobes-inode and the utrace based version.
>>
>> Kind Regards,
>> Torsten
> Torsten,
> Thanks *so* much for the patches. I've seen a hang in stap around
> this area, but I could never reproduce it.
David,
I could easily reproduce the problem for half a year now 100%, but I
never got the time to find the root cause.
> I checked the 1st patch in as commit e695d46 and the 2nd patch
> (tweaked) in as commit 9ee1bfe.
> I tweaked the 2nd patch just a bit. Originally the flow went like:
> ====
> stap_stap_task_finder()
> {
> // ...
> // Note that utrace_exit() calls stp_task_work_exit()
> utrace_exit();
> __stp_tf_cancel_task_work();
> }
> ====
> Your patch changed it to this:
> ====
> stap_stap_task_finder()
> {
> // ...
> utrace_exit();
> // Note that __stp_tf_cancel_task_work() calls
> // stp_task_work_exit()
> __stp_tf_cancel_task_work();
> }
> ====
> I saw what you were doing, but that didn't "feel" quite right.
> utrace_init() calls stp_task_work_init(), so it made sense for
> utrace_exit() to call stp_task_work_exit().
> So, instead I did this:
> ====
> stap_stap_task_finder()
> {
> // ...
> __stp_tf_cancel_task_work();
> // Note that utrace_exit() calls stp_task_work_exit()
> utrace_exit();
> }
> ====
> This moves canceling all outstanding task_work items before shutting
> down utrace (and calling stp_task_work_exit()). I think the end
> result is the same as your patch, and I think this makes a little
> more sense. This way we've canceled all the task_work items before
> shutting down utrace (and freeing all the memory allocated for
> utrace).
> If this doesn't work for you or you see a hole in this logic please
> let me know.
I can't beat your logic. It should work for me. Unfortunately, I don't
have direct access to my target for two weeks.
> BTW, if you have a good idea for a reproducer for the original
> problem I'd like to see it. Perhaps I could add a test case for it.
I simply define a process probe and cross compile the module "foo" for
an ARM target. Then I run "staprun -o /tmp/probes.txt foo". After a
while I (try to) terminate the execution by "Ctrl-C".
If there is a process that is never scheduled, the task worker for the
process is never executed. Thus, staprun hangs. Usually, there are a few
processes that exhibit this behaviour on my target.
> Thanks again for the patches!
> --
> David Smith
> dsmith@redhat.com
> Red Hat
> http://www.redhat.com
> 256.217.0141 (direct)
> 256.837.0057 (fax)
Torsten
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] stap/staprun do not terminate properly
2014-03-07 21:09 ` David Smith
2014-03-07 22:11 ` Torsten Polle
@ 2014-03-10 22:36 ` Josh Stone
2014-03-12 14:11 ` David Smith
1 sibling, 1 reply; 5+ messages in thread
From: Josh Stone @ 2014-03-10 22:36 UTC (permalink / raw)
To: David Smith, Torsten Polle, systemtap
On 03/07/2014 01:09 PM, David Smith wrote:
> So, instead I did this:
>
> ====
> stap_stap_task_finder()
> {
> // ...
>
> __stp_tf_cancel_task_work();
>
> // Note that utrace_exit() calls stp_task_work_exit()
> utrace_exit();
> }
> ====
>
> This moves canceling all outstanding task_work items before shutting
> down utrace (and calling stp_task_work_exit()). I think the end result
> is the same as your patch, and I think this makes a little more sense.
> This way we've canceled all the task_work items before shutting down
> utrace (and freeing all the memory allocated for utrace).
>
> If this doesn't work for you or you see a hole in this logic please let
> me know.
I notice that utrace_exit() calls utrace_shutdown(), but so does
__stp_task_finder_cleanup() earlier in stap_stop_task_finder(). So in
fact the shutdown is still happening before canceling this work, and the
only thing left for utrace_exit() is the kmem_cache_destroy and
stp_task_work_exit(). Do you think this could be a problem?
In particular, I worry about this comment:
/* After calling tracepoint_synchronize_unregister(), we're
* sure there are no outstanding tracepoint probes being
* called. So, now would be a great time to free everything. */
There won't be any outstanding tracepoint handlers, but couldn't there
still be outstanding task_work scheduled from those handlers?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] stap/staprun do not terminate properly
2014-03-10 22:36 ` Josh Stone
@ 2014-03-12 14:11 ` David Smith
0 siblings, 0 replies; 5+ messages in thread
From: David Smith @ 2014-03-12 14:11 UTC (permalink / raw)
To: Josh Stone, Torsten Polle, systemtap, Jonathan Lebon
On 03/10/2014 05:36 PM, Josh Stone wrote:
> On 03/07/2014 01:09 PM, David Smith wrote:
>> So, instead I did this:
>>
>> ====
>> stap_stap_task_finder()
>> {
>> // ...
>>
>> __stp_tf_cancel_task_work();
>>
>> // Note that utrace_exit() calls stp_task_work_exit()
>> utrace_exit();
>> }
>> ====
>>
>> This moves canceling all outstanding task_work items before shutting
>> down utrace (and calling stp_task_work_exit()). I think the end result
>> is the same as your patch, and I think this makes a little more sense.
>> This way we've canceled all the task_work items before shutting down
>> utrace (and freeing all the memory allocated for utrace).
>>
>> If this doesn't work for you or you see a hole in this logic please let
>> me know.
>
> I notice that utrace_exit() calls utrace_shutdown(), but so does
> __stp_task_finder_cleanup() earlier in stap_stop_task_finder(). So in
> fact the shutdown is still happening before canceling this work, and the
> only thing left for utrace_exit() is the kmem_cache_destroy and
> stp_task_work_exit(). Do you think this could be a problem?
>
> In particular, I worry about this comment:
>
> /* After calling tracepoint_synchronize_unregister(), we're
> * sure there are no outstanding tracepoint probes being
> * called. So, now would be a great time to free everything. */
>
> There won't be any outstanding tracepoint handlers, but couldn't there
> still be outstanding task_work scheduled from those handlers?
>
I went over this code yesterday, and just committed a patch (commit
6628352) that tightens things up here. This patch fixes a crash that
Jonathan was seeing. The patch make sure that all the tracepoint probes
and task work items are finished running before the kmem caches are freed.
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-03-12 14:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-06 21:30 [PATCH] stap/staprun do not terminate properly Torsten Polle
2014-03-07 21:09 ` David Smith
2014-03-07 22:11 ` Torsten Polle
2014-03-10 22:36 ` Josh Stone
2014-03-12 14:11 ` David Smith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).