Making the transport layer more robust

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Making the transport layer more robust
@ 2011-07-19  8:59 Mark Wielaard
  2011-07-19 11:42 ` Mark Wielaard
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-07-19  8:59 UTC (permalink / raw)
  To: systemtap

Hi,

Since I was noticing some crashes related to Bug 12960 - _stp_ctl_send
tries to msleep when out of memory, I went through the transport layer
and cleaned some stuff up. This also solves Bug 10189 - STP_START gets
lost in a warning flood. The whole thing now feels a lot more robust. I
added a testcase testsuite/systemtap.base/warn_overflow.stp that often
could crash my machine. Now it never crashes and staprun exists nicely
while providing some warning messages about the warn/system related
floods and the module gets cleanly unloaded.

Since this was somewhat tricky most of the patches just add a lot of
comments about the transport layers, the message types and the
interactions between the messages and locks used in the code. It would
be nice if someone could read through the patches to see if there is
anything I missed or misunderstood.

I tested on f14 x86_64 and rhel5 x86_64 on which I could reproduce the
crashers previously and they are totally gone. The test results on f14
x86_64 look pretty good. I am still investigating the test results on
rhel5, they don't look that good (in particular exelib.exp fails a lot
of tests because it is unable to find process.mark probe points), but I
don't believe any of the failures is new or caused by my changes. Except
maybe one on rhel5 testing some abnormal exit case, pr10854.exp.

pr10854.exp acts strangely on rhel5, it seems fine on f14. It just sits
there waiting the reap staprun, which will never happen since it tries
to pkill it at the same time, that could be because the startup/exit of
staprun/stapio is much more robust now, but I don't fully understand the
expect spawn, catch, wait logic. Maybe it is some strange bug in the
rhel5 expect? Maybe I changed some expectation of staprun/stapio/module
interaction? Any help understanding the expect logic would be
appreciated.

Apart from that stap feels a lot more robust now. I do occasionally see
kernel message "hrtimer: interrupt took 30014948 ns" while running
(multiple) make installchecks, which is somewhat worrying, but doesn't
immediately seem to be bad nor new (I have seen these before some time
back).

I haven't yet tested on really old kernels, like those used on rhel4,
where we are using an almost completely different transport layer
implementation (even using different message types in some cases). I'll
try and find such an old machine and test a bit (does anybody still use
such an old beast with systemtap git trunk?). But in general I think we
should deprecate this part of the transport layer. It makes things
really complex.

I hope that after the next release I can cleanup the transport layer a
bit more. In particular I think we should really get rid of the
difference between _stp_ctl_send and _stp_ctl_write, which leads to
needing an extra timer just to flush the messages from the queue (and
check for the exit condition). If we can replace that logic with just a
wait queue based trigger then the code could become a little simpler.
The extra timer triggers made analyzing the above two bugs that much
harder.

Again, patch review of the commits would be really appreciated. If only
to make sure there are others that get to know this part of the code a
bit more.

Thanks,

Mark

commit d8edf750d3c8815784a32c96c9fd939ce5e2f85b
Author: Mark Wielaard <mjw@redhat.com>
Date:   Mon Jul 18 20:55:38 2011 +0200

    PR10189 and PR12960 reserve system cmd messages for delivery.

    runtime/transport/control.c kept one pool for all cmd messages that
    the module had to deliver to staprun/io.  This pool could become
    empty.  This meant essential control message would not be delivered.
    Leading to the module not properly starting and/or exiting.

    We now set aside buffers for one time messages (STP_START, STP_EXIT,
    STP_TRANSPORT, STAP_REQUEST_EXIT) and "overflow" messages that get
    delivered whenever one of the dynamically allocated messages cannot
    get a free slot from the pool (STP_OOB_DATA - warnings and errors,
    STP_SYSTEM and STP_REALTIME_DATA).

    The type field is used to mark whether or not a special pre-allocated
    buffer is currently unused. This needs careful locking using a new
    &_stp_ctl_special_msg_lock that is used in the new helper functions
    _stp_ctl_get_buffer and _stp_ctl_free_buffer.

    Now when we run out of message buffers we just drop the message and
    printk. stapio will have received either the one time message or an
    overflow message, there is nothing more we can do.

    The STP_DEFAULT_BUFFERS for debugfs.c got decreased again to allow
    8 pre-allocated and 32 dynamic (pending) cmd messages.

    A new testcase testsuite/systemtap.base/warn_overflow.exp was added.

commit 8a581846923f5bd210644a05add78e13ed5591f1
Author: Mark Wielaard <mjw@redhat.com>
Date:   Mon Jul 18 16:11:11 2011 +0200

    staprun STP_OOB_DATA payload.data "WARNING:" prefix shouldn't be translated.

    The STP_OOB_DATA payload data prefix "WARNING:" or "ERROR:" is part of
    the module cmd protocol.

commit 3eafa6b8fd7e016eb5236815f84844d87cb5c708
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Jul 15 23:54:47 2011 +0200

    PR12960 Don't msleep in _stp_ctl_send when out of memory.

    This is mainly a documentation patch to better explain the transport
    layers and the interaction between _stp_ctl_read_cmd, _stp_ctl_send and
    _stp_ctl_write.

    It also contains the first step to resolve PR12960. The msleep() in
    _stp_ctl_send() has been replaced with a loop that checks whether there
    are messages on the queue, tries to wake up _stp_ctl_read_cmd so stapio
    has a change to read some of the pending messages and a small mdelay
    (which is save, because it doesn't actually sleep or schedule). It
    only prevents the crash and makes the possibility of loosing control
    messages slightly less. A followup patch will introduce special buffers
    to hold cannot be lost messages so the module will always be able to
    properly shut down.

    STP_DEFAULT_BUFFERS for debugfs also got increased a little from 50 to 64.

commit c714669f502d29e9c5317a648986709be2ff7454
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Jul 15 17:06:52 2011 +0200

    Create dropped file with mode 0400 in relay_v2.c to make it not world readable.

commit 27d0d0cabb7452088ab5336195b0a07ee1413fda
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Jul 15 15:11:41 2011 +0200

    Document stp control channel command values under runtime/transport.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
@ 2011-07-19 11:42 ` Mark Wielaard
  2011-07-19 15:03 ` Mark Wielaard
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-07-19 11:42 UTC (permalink / raw)
  To: systemtap

On Tue, 2011-07-19 at 10:58 +0200, Mark Wielaard wrote:
> I haven't yet tested on really old kernels, like those used on rhel4,
> where we are using an almost completely different transport layer
> implementation (even using different message types in some cases).
> I'll try and find such an old machine and test a bit (does anybody
> still use such an old beast with systemtap git trunk?). But in general
> I think we should deprecate this part of the transport layer. It makes
> things really complex.

I found a RHEL4-U8 x86_64 install which uses a 2.6.9 (!) kernel. I fixed
things up so that it could be build and run at least the evil transport
layer testcase, and that seems to also work fine on such an old system.

But it would be nice to know whether people really expect systemtap
trunk to work on such old systems. Looking at some of the test failures
there is a lot that just doesn't work. And I was actually able to crash
the system (unrelated to the transport layer, but our kprobe entry code
seems flaky on such an old system). Is it really worth it to keep
supporting it? Dropping such old kernels would certainly make it
possible to clean up some tricky code.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
  2011-07-19 11:42 ` Mark Wielaard
@ 2011-07-19 15:03 ` Mark Wielaard
  2011-07-20  8:29   ` Mark Wielaard
  2011-07-19 15:05 ` William Cohen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Mark Wielaard @ 2011-07-19 15:03 UTC (permalink / raw)
  To: systemtap

On Tue, 2011-07-19 at 10:58 +0200, Mark Wielaard wrote:
> pr10854.exp acts strangely on rhel5, it seems fine on f14. It just sits
> there waiting the reap staprun, which will never happen since it tries
> to pkill it at the same time, that could be because the startup/exit of
> staprun/stapio is much more robust now, but I don't fully understand the
> expect spawn, catch, wait logic. Maybe it is some strange bug in the
> rhel5 expect? Maybe I changed some expectation of staprun/stapio/module
> interaction? Any help understanding the expect logic would be
> appreciated.

I think I narrowed this down to the following commit:

commit 5c854d7ca64df766c581c9ed7ff81e04c9d1fa4d
Author: Chris Meek <cmeek@redhat.com>
Date:   Wed Jul 13 10:31:47 2011 -0400

    PR12890: Renaming modules in Staprun

Although it is somewhat hard to say, because it doesn't always fail. But
I have never seen it fail before this commit.

Still trying to understand the real issue and the testcase though. So
all help appreciated.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
  2011-07-19 11:42 ` Mark Wielaard
  2011-07-19 15:03 ` Mark Wielaard
@ 2011-07-19 15:05 ` William Cohen
  2011-07-20 14:13 ` Mark Wielaard
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: William Cohen @ 2011-07-19 15:05 UTC (permalink / raw)
  To: systemtap

On 07/19/2011 04:58 AM, Mark Wielaard wrote:

> pr10854.exp acts strangely on rhel5, it seems fine on f14. It just sits
> there waiting the reap staprun, which will never happen since it tries
> to pkill it at the same time, that could be because the startup/exit of
> staprun/stapio is much more robust now, but I don't fully understand the
> expect spawn, catch, wait logic. Maybe it is some strange bug in the
> rhel5 expect? Maybe I changed some expectation of staprun/stapio/module
> interaction? Any help understanding the expect logic would be
> appreciated.

Another data point. PR10854.exp also acts strangely on the fedora 13 arm running a custom kernel from:

 git://gitorious.org/efikamx/linux-kernel.git

$ uname -a
Linux smartbook-fedora-arm 2.6.31.14.24-efikamx #1 PREEMPT Mon Jul 18 21:32:54 EDT 2011 armv7l armv7l armv7l GNU/Linux

Ran with:

make installcheck RUNTESTFLAGS="--debug systemtap.base/pr10854.exp"

tail -f testsuite/systemtap.log:

Running /media/pata/home/wcohen_fedora/systemtap_write/systemtap/testsuite/systemtap.base/pr10854.exp ...
PASS: compiling pr10854.stp
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")
trap: setting up signal 11 ("SIGSEGV")

Seems to wait and wait. Hit control-c. then get the following in log:

got a INT signal, interrupted by user 

		=== systemtap Summary ===

# of expected passes		1
runtest completed at Tue Jul 19 11:04:56 2011

-Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19 15:03 ` Mark Wielaard
@ 2011-07-20  8:29   ` Mark Wielaard
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-07-20  8:29 UTC (permalink / raw)
  To: systemtap

On Tue, 2011-07-19 at 17:02 +0200, Mark Wielaard wrote:
> On Tue, 2011-07-19 at 10:58 +0200, Mark Wielaard wrote:
> > pr10854.exp acts strangely on rhel5, it seems fine on f14. It just sits
> > there waiting the reap staprun, which will never happen since it tries
> > to pkill it at the same time, that could be because the startup/exit of
> > staprun/stapio is much more robust now, but I don't fully understand the
> > expect spawn, catch, wait logic. Maybe it is some strange bug in the
> > rhel5 expect? Maybe I changed some expectation of staprun/stapio/module
> > interaction? Any help understanding the expect logic would be
> > appreciated.
> 
> I think I narrowed this down to the following commit:
> 
> commit 5c854d7ca64df766c581c9ed7ff81e04c9d1fa4d
> Author: Chris Meek <cmeek@redhat.com>
> Date:   Wed Jul 13 10:31:47 2011 -0400
> 
>     PR12890: Renaming modules in Staprun
> 
> Although it is somewhat hard to say, because it doesn't always fail. But
> I have never seen it fail before this commit.
> 
> Still trying to understand the real issue and the testcase though. So
> all help appreciated.

Frank seems to have fixed it by changing the testcase as follows:

commit 49909b5572bc61c03cc80ef94f6d00dc5bbf665d
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Tue Jul 19 13:52:58 2011 -0400

    resolve PR12890 vs PR10854 bunfight
    
    The PR10854 test case uses a tight loop of staprun and a nexted loop
    of pkills, written in a way that counts on staprun's pre-PR12890
    "insert; unload; retry insert" module-handling heuristic.  With this
    heuristic gone (and error messages properly generated), the PR10854
    test case goes woozy and hangs in the while { ... pkill ... } tcl
    loop.  Now we don't loop in there any more.

The test now passes on all my setups.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
                   ` (2 preceding siblings ...)
  2011-07-19 15:05 ` William Cohen
@ 2011-07-20 14:13 ` Mark Wielaard
  2011-07-21 17:18 ` David Smith
  2011-08-12 17:43 ` Mark Wielaard
  5 siblings, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-07-20 14:13 UTC (permalink / raw)
  To: systemtap

On Tue, 2011-07-19 at 10:58 +0200, Mark Wielaard wrote:
> I am still investigating the test results on
> rhel5, they don't look that good (in particular exelib.exp fails a lot
> of tests because it is unable to find process.mark probe points)

Finally tracked this down to an old binutils bug... sigh. When I finally
knew what it was, the fix was was easy. With this all the 188 exelib.exp
tests PASS also on rhel5 based systems.

Cheers,

Mark

commit 1189063f8adaa13e2da16b51ab665663497ac102
Author: Mark Wielaard <mjw@redhat.com>
Date:   Wed Jul 20 16:05:31 2011 +0200

    Always look for .note.stapsdt sections in the main elf file.
    
    In dwflpp::iterate_over_notes we really want the actual elf file,
    not the dwarf .debug file.  Older binutils had a bug where they
    mangled the SHT_NOTE type during --keep-debug.

diff --git a/dwflpp.cxx b/dwflpp.cxx
index fcc142b..2ab8772 100644
--- a/dwflpp.cxx
+++ b/dwflpp.cxx
@@ -1007,8 +1007,10 @@ int
 dwflpp::iterate_over_notes (void *object, void (*callback)(void *object, int type, const char *data, size_t len))
 {
   Dwarf_Addr bias;
-  Elf* elf = (dwarf_getelf (dwfl_module_getdwarf (module, &bias))
-              ?: dwfl_module_getelf (module, &bias));
+  // Note we really want the actual elf file, not the dwarf .debug file.
+  // Older binutils had a bug where they mangled the SHT_NOTE type during
+  // --keep-debug.
+  Elf* elf = dwfl_module_getelf (module, &bias);
   size_t shstrndx;
   if (elf_getshdrstrndx (elf, &shstrndx))
     return elf_errno();

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
                   ` (3 preceding siblings ...)
  2011-07-20 14:13 ` Mark Wielaard
@ 2011-07-21 17:18 ` David Smith
  2011-08-12 17:43 ` Mark Wielaard
  5 siblings, 0 replies; 25+ messages in thread
From: David Smith @ 2011-07-21 17:18 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap

On 07/19/2011 03:58 AM, Mark Wielaard wrote:
> Again, patch review of the commits would be really appreciated. If only
> to make sure there are others that get to know this part of the code a
> bit more.

I've looked over your transport changes.  They look reasonable to me.  I
added a few more comments about how the STP_OOB_DATA prefixes shouldn't
be translated.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
                   ` (4 preceding siblings ...)
  2011-07-21 17:18 ` David Smith
@ 2011-08-12 17:43 ` Mark Wielaard
  2011-08-15  8:24   ` Mark Wielaard
  2011-08-15 18:30   ` Josh Stone
  5 siblings, 2 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-08-12 17:43 UTC (permalink / raw)
  To: systemtap

On Tue, Jul 19, 2011 at 10:58:28AM +0200, Mark Wielaard wrote:
> I hope that after the next release I can cleanup the transport layer a
> bit more. In particular I think we should really get rid of the
> difference between _stp_ctl_send and _stp_ctl_write, which leads to
> needing an extra timer just to flush the messages from the queue (and
> check for the exit condition). If we can replace that logic with just a
> wait queue based trigger then the code could become a little simpler.
> The extra timer triggers made analyzing the above two bugs that much
> harder.

I finally did this:

commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
Author: Mark Wielaard <mjw@redhat.com>
Date:   Fri Aug 12 19:34:20 2011 +0200

    Remove _stp_ctl_work_timer from module transport layer.
    
    The _stp_ctl_work_timer would trigger every 20ms to check whether
    there were cmd messages queued, but not announced yet and to
    check the _stp_exit_flag was set.
    
    This commit makes all control messages announce themselves and
    check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
    is still possibly delayed since the messages are just pushed on
    a wait queue).

Tested on f14, rhel5 and rhel4 without any regressions in make installcheck.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-08-12 17:43 ` Mark Wielaard
@ 2011-08-15  8:24   ` Mark Wielaard
  2011-08-15 18:30   ` Josh Stone
  1 sibling, 0 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-08-15  8:24 UTC (permalink / raw)
  To: systemtap

Hi,

On Fri, Aug 12, 2011 at 07:43:24PM +0200, Mark Wielaard wrote:
> commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Fri Aug 12 19:34:20 2011 +0200
> 
>     Remove _stp_ctl_work_timer from module transport layer.
>     
>     The _stp_ctl_work_timer would trigger every 20ms to check whether
>     there were cmd messages queued, but not announced yet and to
>     check the _stp_exit_flag was set.
>     
>     This commit makes all control messages announce themselves and
>     check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
>     is still possibly delayed since the messages are just pushed on
>     a wait queue).

And with the timer out of the way it wasn't too hard to add poll
support to the command channel so that we can use a sleeping select
on the channel instead of busy-polling in stapio.

commit a9e19b380f9814630018e79b8cafa3c675dd182c
Author: Mark Wielaard <mjw@redhat.com>
Date:   Sun Aug 14 23:07:46 2011 +0200

    Implement and use select to wait for cmd channel data.

    Add a poll implementation to runtime/transport/control.c
    (_stp_ctl_poll_cmd) based on the _stp_ctl_ready_q wait queue.
    Check whether select is supported in runtime/staprun/mainloop.c
    (stp_main_loop) and use pselect with a sigmask that includes
    SIGURG to get EINTR notifications whenever an interruptable
    event occurred.

I am not seeing any regressions with this, but the signal code
in runtime/staprun/mainloop.c is pretty, uhm, creative, so some
extra review and testing would certainly be appreciated.

This has a nice effect on the stapio impact during probing.
With stap 1.6:
$ stap -e 'global scs;
  probe syscall.* { if (execname() == "stapio") scs[name]++ }' -c 'sleep 10' 
scs["read"]=0x5b
scs["fcntl"]=0x52
scs["ppoll"]=0x32
scs["nanosleep"]=0x28
scs["execve"]=0x5
scs["kill"]=0x1
scs["sigreturn"]=0x1
scs["rt_sigaction"]=0x1
scs["rt_sigprocmask"]=0x1
scs["wait4"]=0x1
scs["write"]=0x1

With stap from git trunk:
$ stap -e 'global scs;
  probe syscall.* { if (execname() == "stapio") scs[name]++ }' -c 'sleep 10'
scs["read"]=0x34
scs["ppoll"]=0x32
scs["execve"]=0x5
scs["fcntl"]=0x4
scs["kill"]=0x1
scs["pselect6"]=0x1
scs["sigreturn"]=0x1
scs["rt_sigaction"]=0x1
scs["rt_sigprocmask"]=0x1
scs["wait4"]=0x1
scs["write"]=0x1

So in this example one pselect6 replaces ~38 reads, ~80 fcntls and
~40 nanosleeps. The remaining reads and (timeing out) ppolls come
from the relay channel. I haven't investigated yet whether those
can be eliminated too.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-08-12 17:43 ` Mark Wielaard
  2011-08-15  8:24   ` Mark Wielaard
@ 2011-08-15 18:30   ` Josh Stone
  2011-08-16 13:23     ` Mark Wielaard
  1 sibling, 1 reply; 25+ messages in thread
From: Josh Stone @ 2011-08-15 18:30 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap

On 08/12/2011 10:43 AM, Mark Wielaard wrote:
> commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Fri Aug 12 19:34:20 2011 +0200
> 
>     Remove _stp_ctl_work_timer from module transport layer.
>     
>     The _stp_ctl_work_timer would trigger every 20ms to check whether
>     there were cmd messages queued, but not announced yet and to
>     check the _stp_exit_flag was set.
>     
>     This commit makes all control messages announce themselves and
>     check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
>     is still possibly delayed since the messages are just pushed on
>     a wait queue).

This has unfortunately left open an opportunity for deadlock.  The
kernel wake_up infrastructure takes a spinlock on the wait queue.  If
the probe context happens to fire while that lock is held, either via a
direct probe on something called by wake_up or indirectly via NMI, then
the handler must not call anything that would attempt the same lock.
But this commit is triggering a wake_up on ctl prints, and commit
a85c8aff triggers the same on exit().

For example, __wake_up_common is called with a lock held, and then
either of these will cause a deadlock:

  probe kernel.function("__wake_up_common") { warn(pp()) }

  probe kernel.function("__wake_up_common") { exit() }

This issue in general is very similar to PR2525.  We must take care not
to call any blocking code from arbitrary probe context.

Thanks,
Josh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-08-15 18:30   ` Josh Stone
@ 2011-08-16 13:23     ` Mark Wielaard
  2011-08-25 12:12       ` Turgis, Frederic
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Wielaard @ 2011-08-16 13:23 UTC (permalink / raw)
  To: Josh Stone; +Cc: systemtap

On Mon, 2011-08-15 at 11:30 -0700, Josh Stone wrote:
> On 08/12/2011 10:43 AM, Mark Wielaard wrote:
> > commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> > Author: Mark Wielaard <mjw@redhat.com>
> > Date:   Fri Aug 12 19:34:20 2011 +0200
> > 
> >     Remove _stp_ctl_work_timer from module transport layer.
> >     
> >     The _stp_ctl_work_timer would trigger every 20ms to check whether
> >     there were cmd messages queued, but not announced yet and to
> >     check the _stp_exit_flag was set.
> >     
> >     This commit makes all control messages announce themselves and
> >     check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
> >     is still possibly delayed since the messages are just pushed on
> >     a wait queue).
> 
> This has unfortunately left open an opportunity for deadlock.  The
> kernel wake_up infrastructure takes a spinlock on the wait queue.  If
> the probe context happens to fire while that lock is held, either via a
> direct probe on something called by wake_up or indirectly via NMI, then
> the handler must not call anything that would attempt the same lock.
> But this commit is triggering a wake_up on ctl prints, and commit
> a85c8aff triggers the same on exit().
> 
> For example, __wake_up_common is called with a lock held, and then
> either of these will cause a deadlock:
> 
>   probe kernel.function("__wake_up_common") { warn(pp()) }
> 
>   probe kernel.function("__wake_up_common") { exit() }
> 
> This issue in general is very similar to PR2525.  We must take care not
> to call any blocking code from arbitrary probe context.

Thanks for catching that. I am surprised none of our tests triggered
this. I added a nasty testcase based on the above example and reverted
most of the above two commit, reintroducing the timer on the kernel side
(luckily we can still keep the poll/select implementation so we won't be
busy polling on the user side at least). I also tried to explicitly
document all the "safe" places in the patch.

commit fc67febc6733e5803e6883a3757abda6268a953a
Author: Mark Wielaard <mjw@redhat.com>
Date:   Tue Aug 16 14:31:29 2011 +0200

  Reintroduce timer for transport cmd channel, don't wake_up unconditionally.

  Revert parts of commit a85c8a "runtime/io.c: Explicitly signal setting of
  _stp_exit_flag" and commit 46ac9e "Remove _stp_ctl_work_timer from module
  transport layer". Introduce a new test wake_up.exp that shows a deadlock
  when sending cmd messages and waking up the reader immediately.

  Renamed _stp_ctl_write to _stp_ctl_send, which can be called from
  everywhere. Rename _stp_ctl_send to _stp_ctl_send_notify that can be
  called from user context in the transport layer itself (this will
  immediately notify any readers). Document all places that use
  _stp_ctl_send_notify directly to clarify why that is safe.

  See http://sourceware.org/ml/systemtap/2011-q3/msg00163.html

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-08-16 13:23     ` Mark Wielaard
@ 2011-08-25 12:12       ` Turgis, Frederic
  2011-08-26 15:45         ` Turgis, Frederic
  2011-09-05 11:27         ` Mark Wielaard
  0 siblings, 2 replies; 25+ messages in thread
From: Turgis, Frederic @ 2011-08-25 12:12 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap

Hi,

These changes are relevant to http://sources.redhat.com/ml/systemtap/2011-q1/msg00179.html where we were trying to list the source/rate of recurrent wake-ups in the tool, that would "damage" Power Management activities on ARM (we use systemtap 80% profiling/20% debug)

So there can be some interest in sharing some thoughts on this as this impacts PM related to control channel:

- before changes:
* userspace (staprun/mainloop.c): we poll "control channel" (cmd channel) through non blocking read every 250ms
* kernel space (transport/control.c):
   + we periodically check for IO (_stp_ctl_ready_q not empty) or exit, through timer (used to be work queue)
   + periodic check for IO is in fact for cmd messages that don't announce themselves (STP_OOB_DATA for ex), i.e. are not allowed to wake-up "control wait queue" after we update _stp_ctl_ready_q. Other cmd messages would not need this check.

However, polling in both spaces looked redundant (we had not seen the "exit" at that time) so we have experimented first to set userspace polling to 2s and removed kernel space polling: tool runs fine but at next start, it needs to unload module before reloading it ("exit" issue)

=> long polling delay works OK, kernel periodic check for IO seems redundant with userspace but kernel polling for exit is needed


- after changes:
* userspace: we use "pselect" if supported. Great !
* kernel space:
   + polling for exit is still needed
   + polling for IO:
      * for cmd messages that don't announce themselves -> now mandatory otherwise pselect() would not return
      * for other cmd messages, it still does not look needed but there is no added value in differentiating them from above messages as we must do polling

=> we will experiment with this version playing only with the delay between 2 pollings (STP_CTL_TIMER_INTERVAL)


This will be a good occasion to reopen the PM topic with more experience from usage

Regards
Fred

Frederic Turgis
OMAP Platform Business Unit - OMAP System Engineering - Platform Enablement



Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-08-25 12:12       ` Turgis, Frederic
@ 2011-08-26 15:45         ` Turgis, Frederic
  2011-08-26 18:45           ` Frank Ch. Eigler
  2011-09-05 11:27         ` Mark Wielaard
  1 sibling, 1 reply; 25+ messages in thread
From: Turgis, Frederic @ 2011-08-26 15:45 UTC (permalink / raw)
  To: Turgis, Frederic, Mark Wielaard; +Cc: systemtap

Hi,

> However, polling in both spaces looked redundant (we had not
> seen the "exit" at that time) so we have experimented first
> to set userspace polling to 2s and removed kernel space
> polling: tool runs fine but at next start, it needs to unload
> module before reloading it ("exit" issue)

To conclude on this, the "exit" we use is through Ctrl-C so is an announced message if I followed code correctly. So I checked for the uncorrect unloading real root cause and this is due to Android not liking exec("/bin/ls", ...) in cleanup_and_exit(). Adding a symlink /bin/sh to /system/bin/sh is sufficient.
Thus also the reason why it worked on Ubuntu.

Regards
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-08-26 15:45         ` Turgis, Frederic
@ 2011-08-26 18:45           ` Frank Ch. Eigler
  2011-08-29  8:32             ` Turgis, Frederic
  0 siblings, 1 reply; 25+ messages in thread
From: Frank Ch. Eigler @ 2011-08-26 18:45 UTC (permalink / raw)
  To: Turgis, Frederic; +Cc: Mark Wielaard, systemtap


f-turgis wrote:

> [...]
> To conclude on this, the "exit" we use is through Ctrl-C so is an
> announced message if I followed code correctly. So I checked for the
> uncorrect unloading real root cause and this is due to Android not
> liking exec("/bin/ls", ...) in cleanup_and_exit(). [...]

You mean our

    execl("/bin/sh", "sh", "-c", cmd, NULL);

calls in one or two places.  We could change that to 

    execlp("sh", "sh", "-c", cmd, NULL);

if android's $PATH etc. are workable.

- FChE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-08-26 18:45           ` Frank Ch. Eigler
@ 2011-08-29  8:32             ` Turgis, Frederic
  2011-08-29 11:21               ` Frank Ch. Eigler
  2011-08-29 14:46               ` Frank Ch. Eigler
  0 siblings, 2 replies; 25+ messages in thread
From: Turgis, Frederic @ 2011-08-29  8:32 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap


> You mean our
>
>     execl("/bin/sh", "sh", "-c", cmd, NULL);
>
> calls in one or two places.  We could change that to
>
>     execlp("sh", "sh", "-c", cmd, NULL);
>
> if android's $PATH etc. are workable.
>
> - FChE
>

I tested that Ok, as Android defines PATH correctly. However, I have found these comments on a systemtap execvp call so I am no sure of possible interferences with other requirements:

/* Note that execvp() is not a direct system call; it does a $PATH
       search in glibc.  We would like to filter out these dummy syscalls
       from the utrace events seen by scripts.

       This filtering would be done for us for free, if we used ptrace
       ...  but see PR6964.  XXX: Instead, we could open-code the
       $PATH search here; put the pause() afterward; and run a direct
       execve instead of execvp().  */

Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-08-29  8:32             ` Turgis, Frederic
@ 2011-08-29 11:21               ` Frank Ch. Eigler
  2011-08-29 14:46               ` Frank Ch. Eigler
  1 sibling, 0 replies; 25+ messages in thread
From: Frank Ch. Eigler @ 2011-08-29 11:21 UTC (permalink / raw)
  To: Turgis, Frederic; +Cc: systemtap

Hi -

On Mon, Aug 29, 2011 at 10:31:48AM +0200, Turgis, Frederic wrote:
> [...]
> I tested that Ok, as Android defines PATH correctly. However, I have
> found these comments on a systemtap execvp call so I am no sure of
> possible interferences with other requirements:
>     /* Note that execvp() is not a direct system call; it does a $PATH
>        search in glibc.  We would like to filter out these dummy syscalls
>        from the utrace events seen by scripts. [...]

Aha, good catch.  We could autoconf it (suffer the extra syscall
traces for android), or use $SHELL perhaps if /bin/sh doesn't exist,
or perform the $PATH search ourselves.

- FChE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-08-29  8:32             ` Turgis, Frederic
  2011-08-29 11:21               ` Frank Ch. Eigler
@ 2011-08-29 14:46               ` Frank Ch. Eigler
  2011-08-30 13:20                 ` Turgis, Frederic
  1 sibling, 1 reply; 25+ messages in thread
From: Frank Ch. Eigler @ 2011-08-29 14:46 UTC (permalink / raw)
  To: Turgis, Frederic; +Cc: Mark Wielaard, systemtap


> I tested that Ok, as Android defines PATH correctly. However, I have
> found these comments on a systemtap execvp call so I am no sure of
> possible interferences with other requirements: [...]

On second thought ... the place where we were worried about these
extra syscalls is the path that fork/exec's the CMD part of "stap -c
CMD".  In the case of system_cmd() and cleanup_and_exit(), those don't
matter.

- FChE

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-08-29 14:46               ` Frank Ch. Eigler
@ 2011-08-30 13:20                 ` Turgis, Frederic
  0 siblings, 0 replies; 25+ messages in thread
From: Turgis, Frederic @ 2011-08-30 13:20 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap

Probably overkill but I opened http://sourceware.org/bugzilla/show_bug.cgi?id=13142 to track.
Using execlp has removed issue for other users so that's my favorite if OK for everyone.

Regards
Fred

Frederic Turgis
OMAP Platform Business Unit - OMAP System Engineering - Platform Enablement



>
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

-----Original Message-----

> From: Frank Ch. Eigler [mailto:fche@redhat.com]
> Sent: Monday, August 29, 2011 4:47 PM
> To: Turgis, Frederic
> Cc: Mark Wielaard; systemtap@sourceware.org
> Subject: Re: Making the transport layer more robust
>
>
> > I tested that Ok, as Android defines PATH correctly.
> However, I have
> > found these comments on a systemtap execvp call so I am no sure of
> > possible interferences with other requirements: [...]
>
> On second thought ... the place where we were worried about
> these extra syscalls is the path that fork/exec's the CMD
> part of "stap -c CMD".  In the case of system_cmd() and
> cleanup_and_exit(), those don't matter.
>
> - FChE
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-08-25 12:12       ` Turgis, Frederic
  2011-08-26 15:45         ` Turgis, Frederic
@ 2011-09-05 11:27         ` Mark Wielaard
  2011-09-05 14:32           ` Turgis, Frederic
       [not found]           ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
  1 sibling, 2 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-09-05 11:27 UTC (permalink / raw)
  To: Turgis, Frederic; +Cc: systemtap

Hi Turgis,

On Thu, 2011-08-25 at 14:12 +0200, Turgis, Frederic wrote:

> => long polling delay works OK, kernel periodic check for IO seems
> redundant with userspace but kernel polling for exit is needed

The kernel side "polling" is not just for exit, it is for any cmd
message that is generated from a possible "unsafe" context. We might
want to investigate whether we can be a little smarter about this (maybe
use inatomic() as a check whether we may announce immediately or need
the timer, and/or make the timer smarter so it fires less frequently
when we don't expect any messages?).

> - after changes:
> * userspace: we use "pselect" if supported. Great !
> * kernel space:
>    + polling for exit is still needed
>    + polling for IO:
>       * for cmd messages that don't announce themselves -> now
> mandatory otherwise pselect() would not return
>       * for other cmd messages, it still does not look needed but
> there is no added value in differentiating them from above messages as
> we must do polling
> 
> => we will experiment with this version playing only with the delay
> between 2 pollings (STP_CTL_TIMER_INTERVAL)

I am very interested in any results you get from the new code.

Note that there is another timer firing periodically for "normal"
transport messages (none-command/warn/err/exit/system). This is
controlled by STP_RELAY_TIMER_INTERVAL. This is needed because relayfs
can only notify user space on "buffer overflow". So when running stap in
"line mode" (aka non-bulk-mode, not using stap -b), the kernel-side
needs to periodically poll to see if any new messages should be
immediately given to user space. Running in bulk-mode (-b) will
completely get rid of this timer, but I don't yet know how to make it
less noticeable for line-mode, when you want to get notified about probe
messages immediately.

Please let us know about any experiments you run that might help us
better tune these timers.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-09-05 11:27         ` Mark Wielaard
@ 2011-09-05 14:32           ` Turgis, Frederic
       [not found]           ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Turgis, Frederic @ 2011-09-05 14:32 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap

>The kernel side "polling" is not just for exit, it is for any cmd
>message that is generated from a possible "unsafe"
>context

I have then probably not understood code correctly (code was before latest changes, now this polling is mandatory as I mentioned later in mail). "unsafe" context is associated to unannounced message, isn't it ? Well, even for announced messages, I have the impression that reading message relies only on user side polling because kernel side is not waiting for a wake-up of _stp_ctl_ready_q. Here is my understanding but I didn't take time to perform some traces:

- for annouced messages or on kernel polling (_stp_ctl_work_callback), I understand that we trigger a reading through wake_up_interruptible(&_stp_ctl_wq);

- on user side, main reading loop does:
flags |= O_NONBLOCK;
fcntl(control_channel, F_SETFL, flags);
nb = read(control_channel, &recvbuf, sizeof(recvbuf)); So I expect a non blocking read (however, there may be another place where we read cmd message)

This ends in "_stp_ctl_read_cmd" in kernel doing:
while (list_empty(&_stp_ctl_ready_q)) {
                spin_unlock_irqrestore(&_stp_ctl_ready_lock, flags);
                if (file->f_flags & O_NONBLOCK) -> non blocking read, we rely on polling to recheck _stp_ctl_ready_q
                        return -EAGAIN;
                if (wait_event_interruptible(_stp_ctl_wq, !list_empty(&_stp_ctl_ready_q))) -> code not reached so kernel polling (or even message annoucement) useless ?
                        return -ERESTARTSYS;

>I am very interested in any results you get from the new code.

Never tested bulk mode. We quite like filling up buffer and doing a long buffer dump but doing very small regular dumps could also work.

Our modifications are just ugly hacks to understand the internals. They make sense for some, but for some other parts, we probably have different requirements between a server and an embedded platform. Capability to tune a timer would be OK (or maybe bulk-mode would be good)

Here are the v1.3 experiments we performed few months ago (latest months have been too busy with customer to share before :-( )
It seemed to work well at 2 levels:
- task scheduling monitoring: systemtap work-queues and staprun/stapio processes were seen only every s or more, which is OK. And occurrence was matching with timer setting.
- power consumption monitoring (requires specific HW) was showing no CPU activity (interrupts, timers during Idle task are not seen at scheduler level)

Control channel userpace polling (well, I consider control everything that is not trace/data output from script)
-      usleep (250*1000); /* sleep 250ms between polls */
+      usleep (2000*1000); /* sleep 250ms between polls */  -> no longer needed with pselect()

Control channel kernel polling (you might find it a bit extreme ;-) )
-       if (likely(_stp_ctl_attached))
-               queue_delayed_work(_stp_wq, &_stp_work, STP_WORK_TIMER);
+       //if (likely(_stp_ctl_attached))
+       //      queue_delayed_work(_stp_wq, &_stp_work, STP_WORK_TIMER); -> reworked ;-)

Data channel userspace timeout of select()
-       struct timespec tim = {.tv_sec=0, .tv_nsec=200000000}, *timeout = &tim;
+       struct timespec tim = {.tv_sec=5, .tv_nsec=0}, *timeout = &tim;   -> timeout so could be fair to be that high

Data channel kernel polling
-#define STP_RELAY_TIMER_INTERVAL               ((HZ + 99) / 100)
+#define STP_RELAY_TIMER_INTERVAL       HZ              /* ((HZ + 99) / 100) */  -> wake-up every s, we may need tunable

Of course reliability depends on data trace throughput. Main contributor is task scheduling monitoring, around 0.5MB/s max. I had done the computation of number of relayfs buffer*buffer size: we could not overflow all buffers with insufficient wake-up to dump trace.

For v1.5 (and next), we handle control channel kernel side through STP_CTL_INTERVAL and get rid of our old ugly patch. May require tunable for embedded tests as it does not sound very logic to not poll regularly if we want messages back quickly

Regards
fred

Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
       [not found]           ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
@ 2011-09-06 10:12             ` Mark Wielaard
  2011-09-06 14:30               ` Turgis, Frederic
  2011-09-06 14:37               ` David Smith
  0 siblings, 2 replies; 25+ messages in thread
From: Mark Wielaard @ 2011-09-06 10:12 UTC (permalink / raw)
  To: Turgis, Frederic; +Cc: systemtap

Hi Turgis,

On Mon, 2011-09-05 at 16:04 +0200, Turgis, Frederic wrote:
> >The kernel side "polling" is not just for exit, it is for any
> >cmd message that is generated from a possible "unsafe"
> >context
> 
> I have then probably not understood code correctly (code was before
> latest changes, now this polling is mandatory as I mentioned later in
> mail). "unsafe" context is associated to unannounced message, no ?

Yes, it depends on the probe point the message is generated from. Since
in general we cannot know whether a probe is "safe" or not, we always
just queue the message and continue. The kernel timer then periodically
checks the queue (and the exit flag) and announces it. I added a
particularly nasty example of a probe point where immediately announcing
the message to user space would be really bad:
testsuite/systemtap.base/wakeup.exp

>  Well, even for announced messages, I have the impression that reading
> message relies only on user side polling because user side is not
> waiting for a wake-up of _stp_ctl_ready_q. Here is my understanding
> but I didn't take time to perform some traces:
> 
> - for annouced messages or on kernel polling (_stp_ctl_work_callback),
> I understand that we trigger a reading through
> wake_up_interruptible(&_stp_ctl_wq);

Correct.

> - on user side, main reading loop does:
> flags |= O_NONBLOCK;
> fcntl(control_channel, F_SETFL, flags);
> nb = read(control_channel, &recvbuf, sizeof(recvbuf));
> So I expect a non blocking read (however, there may be another place where we read cmd message)

Indeed. That is the first pass, if there is anything to read, we read
it, but then (if select is supported) we call select() on the
control_channel. This means we will wait there in user space until the
kernel explicitly wakes us up to tell there are messages waiting, but we
don't need to poll ourselves in user space. Only when select returns
will we try to read a message from the kernel again.

> This ends in "_stp_ctl_read_cmd" in kernel doing:
> while (list_empty(&_stp_ctl_ready_q)) {
>                 spin_unlock_irqrestore(&_stp_ctl_ready_lock, flags);
>                 if (file->f_flags & O_NONBLOCK) -> non blocking read, we rely on polling to recheck _stp_ctl_ready_q
>                         return -EAGAIN;
>                 if (wait_event_interruptible(_stp_ctl_wq, !list_empty(&_stp_ctl_ready_q))) -> code not reached so kernel polling (or even message annoucement) useless ?
>                         return -ERESTARTSYS;

Yeah, you are right, that second bit is a little strange, we kind of
depend on user space always calling us with O_NONBLOCK. But if it
doesn't, then at least we do something somewhat sensible.

> >I am very interested in any results you get from the new code.
> 
> Never tested bulk mode. We quite like filling up buffer and doing a
> big dump but doing very small regular dumps could also work.

I think you should also try bulk mode (stap -b) if possible. It won't
give you immediate results, but because of the extra buffering it should
definitely be less overhead.

> Our modifications are just ugly hacks to understand the internals.
> They make sense for some, but for some other parts, we probably have
> different requirements between a server and an embedded platform.
> Capability to tune a timer would be OK (or maybe bulk-mode would be
> good)
> 
> Here are the v1.3 experiments we performed few months ago (latest
> months have been too busy with customer to share before :-( ). Goal
> was to remove "stapio:5192" process and "systemtap/1:5189" workqueue
> from waking up too often, here every 200ms and 7.8125ms (HZ=128)
> respectively. Attached picture shows what each core runs in time
> (blue=core 0, red=core 1, the lower a process, the more it runs,
> except swapper=Idle tasks always at bottom)

Looks like the attachement somehow didn't reach the mailinglist. odd.

> Control channel userpace polling (well, I consider control everything that is not trace/data output from script)
> -      usleep (250*1000); /* sleep 250ms between polls */
> +      usleep (2000*1000); /* sleep 250ms between polls */

So with current git trunk, you get pselect support, so don't need to
tweak this anymore.

> Control channel kernel polling (you might find it a bit extreme ;-) )
> -       if (likely(_stp_ctl_attached))
> -               queue_delayed_work(_stp_wq, &_stp_work, STP_WORK_TIMER);
> +       //if (likely(_stp_ctl_attached))
> +       //      queue_delayed_work(_stp_wq, &_stp_work, STP_WORK_TIMER);

yeah, that is somewhat extreme, if possible try just
-DSTP_WORK_TIMER=... to get a good balance. I am interested in what
value works well for you.

> Data channel userspace timeout of select()
> -       struct timespec tim = {.tv_sec=0, .tv_nsec=200000000}, *timeout = &tim;
> +       struct timespec tim = {.tv_sec=5, .tv_nsec=0}, *timeout = &tim;

With bulk mode this shouldn't be necessary. But it might be nice to make
this tunable too with some -D option.

> Data channel kernel polling
> -#define STP_RELAY_TIMER_INTERVAL               ((HZ + 99) / 100)
> +#define STP_RELAY_TIMER_INTERVAL       HZ              /* ((HZ + 99) / 100) */  -> wake-up every s

Hopefully you can use a -DSTP_RELAY_TIMER_INTERVAL for this too on the
command line instead. I am really hoping we make it so that you don't
have to hack the sources anymore just for experimenting with the
timers/poll intervals.

Thanks for your experiments and report.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-09-06 10:12             ` Mark Wielaard
@ 2011-09-06 14:30               ` Turgis, Frederic
  2011-09-06 14:37               ` David Smith
  1 sibling, 0 replies; 25+ messages in thread
From: Turgis, Frederic @ 2011-09-06 14:30 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap

Hi,

>Indeed. That is the first pass, if there is anything to read,
>we read it, but then (if select is supported) we call select()
>on the control_channel

Mmmmh, we are probably not talking about the same thing ;-) I was stating "before changes/after changes" in 1st mail to mention introduction of select() in the code. In fact I was doing a comparison between v1.3 and latest mainline.
To explain my point, I then re-explained the "before changes" but did not emphasize it was the case. And in "before changes", code states: "The runtime does not implement select() on the command filehandle"

No select() means kernel polling is useless thanks to userspace polling. But that is old story !

>Yeah, you are right, that second bit is a little strange, we
>kind of depend on user space always calling us with
>O_NONBLOCK. But if it doesn't, then at least we do something
>somewhat sensible.

In the past, I had wondered if userspace could not make a blocking read to remove userspace polling. But now we have select() so we could probably remove any (dead) code related to blocking read.

>> Control channel userpace polling (well, I consider control
>everything that is not trace/data output from script)
>> -      usleep (250*1000); /* sleep 250ms between polls */
>> +      usleep (2000*1000); /* sleep 250ms between polls */
>
>So with current git trunk, you get pselect support, so don't
>need to tweak this anymore.

Aligned

>yeah, that is somewhat extreme, if possible try just
>-DSTP_WORK_TIMER=... to get a good balance. I am interested in
>what value works well for you.

Acceptable on v1.3 ;-)
On v1.5, we kept polling and went for 1s and it seemed to work well. This is giving latency I imagine but 1s is not annoying for us. I had to stop testing to have board repaired.

>
>> Data channel userspace timeout of select()
>> -       struct timespec tim = {.tv_sec=0,
>.tv_nsec=200000000}, *timeout = &tim;
>> +       struct timespec tim = {.tv_sec=5, .tv_nsec=0}, *timeout =
>> + &tim;
>
>With bulk mode this shouldn't be necessary. But it might be
>nice to make this tunable too with some -D option.

The key point here and above is that we can use -D as anyone has to recompile kernel module. This would be the customization we are looking for (slow polling for PM cases, fast polling for cases with lot of trace).
Userspace would need to be frozen (thanks to select()/ppoll()) as we want to deliver it pre-compiled to other developers. So -D would not really help.
What is the rationale behind 200ms timeout ? Bulkmode code explains that there is a potential race condition but then sets a timeout of 10s.

Well, we are close to close on these Power Management topics, great...

Regards
Fred

Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-09-06 10:12             ` Mark Wielaard
  2011-09-06 14:30               ` Turgis, Frederic
@ 2011-09-06 14:37               ` David Smith
  2011-09-06 15:37                 ` David Smith
  1 sibling, 1 reply; 25+ messages in thread
From: David Smith @ 2011-09-06 14:37 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Turgis, Frederic, systemtap

On 09/06/2011 05:12 AM, Mark Wielaard wrote:

> Hopefully you can use a -DSTP_RELAY_TIMER_INTERVAL for this too on the
> command line instead. I am really hoping we make it so that you don't
> have to hack the sources anymore just for experimenting with the
> timers/poll intervals.

I checked, and the user can override STP_RELAY_TIMER_INTERVAL in head
systemtap (with '-DSTP_RELAY_TIMER_INTERVAL=...').  I just checked in
commit 17733c2 to allow the user to do the same with STP_WORK_TIMER
(with -DSTP_WORK_TIMER=...').

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Making the transport layer more robust
  2011-09-06 14:37               ` David Smith
@ 2011-09-06 15:37                 ` David Smith
  2011-09-06 16:25                   ` Turgis, Frederic
  0 siblings, 1 reply; 25+ messages in thread
From: David Smith @ 2011-09-06 15:37 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Turgis, Frederic, systemtap

On 09/06/2011 09:37 AM, David Smith wrote:

> On 09/06/2011 05:12 AM, Mark Wielaard wrote:
> 
>> Hopefully you can use a -DSTP_RELAY_TIMER_INTERVAL for this too on the
>> command line instead. I am really hoping we make it so that you don't
>> have to hack the sources anymore just for experimenting with the
>> timers/poll intervals.
> 
> I checked, and the user can override STP_RELAY_TIMER_INTERVAL in head
> systemtap (with '-DSTP_RELAY_TIMER_INTERVAL=...').  I just checked in
> commit 17733c2 to allow the user to do the same with STP_WORK_TIMER
> (with -DSTP_WORK_TIMER=...').


So then I thought "why don't I rename 'STP_WORK_TIMER' to
'STP_WORK_TIMER_INTERVAL' to match 'STP_RELAY_TIMER_INTERVAL', only to
realize that 'STP_WORK_TIMER' isn't used anymore.

So, commit d354062 deletes STP_WORK_TIMER.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Making the transport layer more robust
  2011-09-06 15:37                 ` David Smith
@ 2011-09-06 16:25                   ` Turgis, Frederic
  0 siblings, 0 replies; 25+ messages in thread
From: Turgis, Frederic @ 2011-09-06 16:25 UTC (permalink / raw)
  To: David Smith, Mark Wielaard; +Cc: systemtap

>So then I thought "why don't I rename 'STP_WORK_TIMER' to
>'STP_WORK_TIMER_INTERVAL' to match 'STP_RELAY_TIMER_INTERVAL',
>only to realize that 'STP_WORK_TIMER' isn't used anymore.
>
>So, commit d354062 deletes STP_WORK_TIMER.

Right, latest timer names help to the understanding of code (even if STP and TIMER were good inputs first time we digged into code):
- STP_RELAY_TIMER_INTERVAL
- STP_CTL_TIMER_INTERVAL


Frederic Turgis
OMAP Platform Business Unit - OMAP System Engineering - Platform Enablement


>
>--
>David Smith
>dsmith@redhat.com
>Red Hat
>http://www.redhat.com
>256.217.0141 (direct)
>256.837.0057 (fax)
>
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-09-06 16:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-19  8:59 Making the transport layer more robust Mark Wielaard
2011-07-19 11:42 ` Mark Wielaard
2011-07-19 15:03 ` Mark Wielaard
2011-07-20  8:29   ` Mark Wielaard
2011-07-19 15:05 ` William Cohen
2011-07-20 14:13 ` Mark Wielaard
2011-07-21 17:18 ` David Smith
2011-08-12 17:43 ` Mark Wielaard
2011-08-15  8:24   ` Mark Wielaard
2011-08-15 18:30   ` Josh Stone
2011-08-16 13:23     ` Mark Wielaard
2011-08-25 12:12       ` Turgis, Frederic
2011-08-26 15:45         ` Turgis, Frederic
2011-08-26 18:45           ` Frank Ch. Eigler
2011-08-29  8:32             ` Turgis, Frederic
2011-08-29 11:21               ` Frank Ch. Eigler
2011-08-29 14:46               ` Frank Ch. Eigler
2011-08-30 13:20                 ` Turgis, Frederic
2011-09-05 11:27         ` Mark Wielaard
2011-09-05 14:32           ` Turgis, Frederic
     [not found]           ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
2011-09-06 10:12             ` Mark Wielaard
2011-09-06 14:30               ` Turgis, Frederic
2011-09-06 14:37               ` David Smith
2011-09-06 15:37                 ` David Smith
2011-09-06 16:25                   ` Turgis, Frederic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).