RE: Making the transport layer more robust

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

From: "Turgis, Frederic" <f-turgis@ti.com>
To: Mark Wielaard <mjw@redhat.com>
Cc: "systemtap@sourceware.org" <systemtap@sourceware.org>
Subject: RE: Making the transport layer more robust
Date: Mon, 05 Sep 2011 14:32:00 -0000	[thread overview]
Message-ID: <13872098A06B02418CF379A158C0F1460163182646@dnce02.ent.ti.com> (raw)
In-Reply-To: <1315222009.3431.22.camel@springer.wildebeest.org>

>The kernel side "polling" is not just for exit, it is for any cmd
>message that is generated from a possible "unsafe"
>context

I have then probably not understood code correctly (code was before latest changes, now this polling is mandatory as I mentioned later in mail). "unsafe" context is associated to unannounced message, isn't it ? Well, even for announced messages, I have the impression that reading message relies only on user side polling because kernel side is not waiting for a wake-up of _stp_ctl_ready_q. Here is my understanding but I didn't take time to perform some traces:

- for annouced messages or on kernel polling (_stp_ctl_work_callback), I understand that we trigger a reading through wake_up_interruptible(&_stp_ctl_wq);

- on user side, main reading loop does:
flags |= O_NONBLOCK;
fcntl(control_channel, F_SETFL, flags);
nb = read(control_channel, &recvbuf, sizeof(recvbuf)); So I expect a non blocking read (however, there may be another place where we read cmd message)

This ends in "_stp_ctl_read_cmd" in kernel doing:
while (list_empty(&_stp_ctl_ready_q)) {
                spin_unlock_irqrestore(&_stp_ctl_ready_lock, flags);
                if (file->f_flags & O_NONBLOCK) -> non blocking read, we rely on polling to recheck _stp_ctl_ready_q
                        return -EAGAIN;
                if (wait_event_interruptible(_stp_ctl_wq, !list_empty(&_stp_ctl_ready_q))) -> code not reached so kernel polling (or even message annoucement) useless ?
                        return -ERESTARTSYS;

>I am very interested in any results you get from the new code.

Never tested bulk mode. We quite like filling up buffer and doing a long buffer dump but doing very small regular dumps could also work.

Our modifications are just ugly hacks to understand the internals. They make sense for some, but for some other parts, we probably have different requirements between a server and an embedded platform. Capability to tune a timer would be OK (or maybe bulk-mode would be good)

Here are the v1.3 experiments we performed few months ago (latest months have been too busy with customer to share before :-( )
It seemed to work well at 2 levels:
- task scheduling monitoring: systemtap work-queues and staprun/stapio processes were seen only every s or more, which is OK. And occurrence was matching with timer setting.
- power consumption monitoring (requires specific HW) was showing no CPU activity (interrupts, timers during Idle task are not seen at scheduler level)

Control channel userpace polling (well, I consider control everything that is not trace/data output from script)
-      usleep (250*1000); /* sleep 250ms between polls */
+      usleep (2000*1000); /* sleep 250ms between polls */  -> no longer needed with pselect()

Control channel kernel polling (you might find it a bit extreme ;-) )
-       if (likely(_stp_ctl_attached))
-               queue_delayed_work(_stp_wq, &_stp_work, STP_WORK_TIMER);
+       //if (likely(_stp_ctl_attached))
+       //      queue_delayed_work(_stp_wq, &_stp_work, STP_WORK_TIMER); -> reworked ;-)

Data channel userspace timeout of select()
-       struct timespec tim = {.tv_sec=0, .tv_nsec=200000000}, *timeout = &tim;
+       struct timespec tim = {.tv_sec=5, .tv_nsec=0}, *timeout = &tim;   -> timeout so could be fair to be that high

Data channel kernel polling
-#define STP_RELAY_TIMER_INTERVAL               ((HZ + 99) / 100)
+#define STP_RELAY_TIMER_INTERVAL       HZ              /* ((HZ + 99) / 100) */  -> wake-up every s, we may need tunable

Of course reliability depends on data trace throughput. Main contributor is task scheduling monitoring, around 0.5MB/s max. I had done the computation of number of relayfs buffer*buffer size: we could not overflow all buffers with insufficient wake-up to dump trace.

For v1.5 (and next), we handle control channel kernel side through STP_CTL_INTERVAL and get rid of our old ugly patch. May require tunable for embedded tests as it does not sound very logic to not poll regularly if we want messages back quickly

Regards
fred

Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

next prev parent reply	other threads:[~2011-09-05 14:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-19  8:59 Mark Wielaard
2011-07-19 11:42 ` Mark Wielaard
2011-07-19 15:03 ` Mark Wielaard
2011-07-20  8:29   ` Mark Wielaard
2011-07-19 15:05 ` William Cohen
2011-07-20 14:13 ` Mark Wielaard
2011-07-21 17:18 ` David Smith
2011-08-12 17:43 ` Mark Wielaard
2011-08-15  8:24   ` Mark Wielaard
2011-08-15 18:30   ` Josh Stone
2011-08-16 13:23     ` Mark Wielaard
2011-08-25 12:12       ` Turgis, Frederic
2011-08-26 15:45         ` Turgis, Frederic
2011-08-26 18:45           ` Frank Ch. Eigler
2011-08-29  8:32             ` Turgis, Frederic
2011-08-29 11:21               ` Frank Ch. Eigler
2011-08-29 14:46               ` Frank Ch. Eigler
2011-08-30 13:20                 ` Turgis, Frederic
2011-09-05 11:27         ` Mark Wielaard
2011-09-05 14:32           ` Turgis, Frederic [this message]
     [not found]           ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
2011-09-06 10:12             ` Mark Wielaard
2011-09-06 14:30               ` Turgis, Frederic
2011-09-06 14:37               ` David Smith
2011-09-06 15:37                 ` David Smith
2011-09-06 16:25                   ` Turgis, Frederic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13872098A06B02418CF379A158C0F1460163182646@dnce02.ent.ti.com \
    --to=f-turgis@ti.com \
    --cc=mjw@redhat.com \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).