public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/10189] New: STP_START gets lost in a warning flood
@ 2009-05-22 20:45 jistone at redhat dot com
  2009-11-16  9:47 ` [Bug runtime/10189] " wenji dot huang at oracle dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: jistone at redhat dot com @ 2009-05-22 20:45 UTC (permalink / raw)
  To: systemtap

This command will stall:
  stap -e 'probe begin { while (++i<100) warn("something is wrong!") }' -c true

Such a warning flood can happen, for example, when a lot of kernel.function or
kprobe.function probes fail to register.  The script is supposed to continue
anyway, but in this case the child process is waiting for a SIGUSR1 before it
begins.  That signal should be sent on an STP_START control message, but
debugging reveals that STP_START is never seen by stapio.

I suspect that the control channel is overflowing with STP_OOB_DATA for the
warnings, and so the STP_START is dropped.

-- 
           Summary: STP_START gets lost in a warning flood
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: jistone at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=10189

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug runtime/10189] STP_START gets lost in a warning flood
  2009-05-22 20:45 [Bug runtime/10189] New: STP_START gets lost in a warning flood jistone at redhat dot com
@ 2009-11-16  9:47 ` wenji dot huang at oracle dot com
  2009-11-17  8:06 ` wenji dot huang at oracle dot com
  2009-11-17 17:31 ` fche at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: wenji dot huang at oracle dot com @ 2009-11-16  9:47 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wenji dot huang at oracle dot com  2009-11-16 09:47 -------
Dmesg said:
stap_d5ab9c05965c68b7bcab4fa28635aa95_744: systemtap: 1.0/0.143, base: d1ab5000,
memory: 10488+11284+1704+13600 data+text+ctx+net, probes: 1
ctl_send (type=0 len=8) failed: -12

That means -ENOMEM. To increase STP_DEFAULT_BUFFERS can avoid failure.
Not sure the root cause is memory leaking or too small 
allocated memory pool.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10189

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug runtime/10189] STP_START gets lost in a warning flood
  2009-05-22 20:45 [Bug runtime/10189] New: STP_START gets lost in a warning flood jistone at redhat dot com
  2009-11-16  9:47 ` [Bug runtime/10189] " wenji dot huang at oracle dot com
@ 2009-11-17  8:06 ` wenji dot huang at oracle dot com
  2009-11-17 17:31 ` fche at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: wenji dot huang at oracle dot com @ 2009-11-17  8:06 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wenji dot huang at oracle dot com  2009-11-17 08:06 -------
Current stap will keep allocating buffer for _stp_warn in probe until
out of memory. As a result, _stp_ctl_send(STP_START,...) will be failed
due to exhausted memory. So the child process couldn't catch the signal
and remain waiting.

To increase STP_DEFAULT_BUFFERS is not good way, there always be a limit
for it. Maybe it's better to make _stp_warn utilize _stp_print instead of
_stp_ctl_write.

diff --git a/runtime/io.c b/runtime/io.c
index 0136aae..10b6c8a 100644
--- a/runtime/io.c
+++ b/runtime/io.c
@@ -55,7 +55,7 @@ static void _stp_vlog (enum code type, const char *func, int
line, const char *f
                 else if (type == ERROR) printk (KERN_ERR "%s", buf);
                 else printk (KERN_INFO "%s", buf);
 #else
-               if (type != DBUG)
+               if (type != DBUG && type != WARN)
                        _stp_ctl_write(STP_OOB_DATA, buf, start + num + 1);
                else {
                        _stp_print(buf);

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10189

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug runtime/10189] STP_START gets lost in a warning flood
  2009-05-22 20:45 [Bug runtime/10189] New: STP_START gets lost in a warning flood jistone at redhat dot com
  2009-11-16  9:47 ` [Bug runtime/10189] " wenji dot huang at oracle dot com
  2009-11-17  8:06 ` wenji dot huang at oracle dot com
@ 2009-11-17 17:31 ` fche at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: fche at redhat dot com @ 2009-11-17 17:31 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2009-11-17 17:31 -------
(In reply to comment #2)
> Current stap will keep allocating buffer for _stp_warn in probe until
> out of memory.

That's wrong; script-accessible constructs like error()/warn() should
not be able to deplete a dynamically allocated resource.  OTOH in this
case it looks like the transport is just using its own _stp_mempool_alloc,
which at least is a private resource.

> Maybe it's better to make _stp_warn utilize _stp_print instead of
> _stp_ctl_write.

Then we'd have the same problem with error().  The OOB messages do serve a
useful purpose at staprun time, so let's find a way of preserving OOB while
ensuring that the basic protocol can proceed.

For example, we can know that some of these kernel->staprun messages only
occur singly (i.e., are not queued en masse).  So for them, we could
preallocate a mem_buffer element, to avoid having them contend.  One way
would be to have separate mempools for each message type, and ensuring
that the singleton messages only allocate one or two elements total.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10189

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-11-17 17:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-22 20:45 [Bug runtime/10189] New: STP_START gets lost in a warning flood jistone at redhat dot com
2009-11-16  9:47 ` [Bug runtime/10189] " wenji dot huang at oracle dot com
2009-11-17  8:06 ` wenji dot huang at oracle dot com
2009-11-17 17:31 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).