public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Re: [Fwd: Re: stap early exit]
       [not found] <1207840624.3989.5.camel@dyn9047018139.beaverton.ibm.com>
@ 2008-04-11 19:20 ` Darren Hart
  2008-04-11 20:22   ` Frank Ch. Eigler
  0 siblings, 1 reply; 2+ messages in thread
From: Darren Hart @ 2008-04-11 19:20 UTC (permalink / raw)
  To: Jim Keniston; +Cc: systemtap


On Thu, 2008-04-10 at 08:17 -0700, Jim Keniston wrote:
> -------- Forwarded Message --------
> From: Frank Ch. Eigler <fche@redhat.com>
> To: Jim Keniston <jkenisto@us.ibm.com>
> Cc: systemtap <systemtap@sources.redhat.com>
> Subject: Re: stap early exit
> Date: Wed, 09 Apr 2008 22:05:02 -0400
> 
> Jim Keniston <jkenisto@us.ibm.com> writes:
> 
> > A SystemTap user at IBM is seeing his stap script terminate after a few
> > minutes, for no reason that he or I can figure out.  The final message
> > is:
> > stapio:cleanup_and_exit:229 closing control channel
> >[...]
> > dvhltc@us.ibm.... stapio:cleanup_and_exit:229 closing control channel
> > Pass 5: run completed in 50usr/120sys/213446real ms.
> 
> Indeed odd.  A few things to try to help narrow it down:
> 
> - check if the phenomenon reoccurs

It does.

> - check if it's a regular time interval

Appears to vary based on how much output I'm doing in the tap

> - change the "flag[tid()] = 0" to "delete flag[tid()]"

Good idea, done.  Now the tapset will just stop... no longer getting the
cleanup_and_exit message.

> - try removing the print_backtrace() calls

This makes it run considerable longer.

> - run with "stap -t"

OK....

> - check whether any probes were skipped

Yes, I've seen between 30 and 100 probes skipped.

> - see whether the stap* processes may have operated under some 
>   resource limit like cpu time

Running as root, I think I'm OK here:

# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 40960
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 40960
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

> - try bumping up buffer size

How do I do that?  the -s option?  'man stap' didn't state the default
for -s, what do you suggest I start with?

Thanks for all the tips!

> 
> 
> - FChE
> 
-- 
Darren Hart
Real-Time Linux Team
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Fwd: Re: stap early exit]
  2008-04-11 19:20 ` [Fwd: Re: stap early exit] Darren Hart
@ 2008-04-11 20:22   ` Frank Ch. Eigler
  0 siblings, 0 replies; 2+ messages in thread
From: Frank Ch. Eigler @ 2008-04-11 20:22 UTC (permalink / raw)
  To: Darren Hart; +Cc: Jim Keniston, systemtap

Darren Hart <dvhltc@us.ibm.com> writes:

> [...]
>> - change the "flag[tid()] = 0" to "delete flag[tid()]"
>
> Good idea, done.  Now the tapset will just stop... no longer getting the
> cleanup_and_exit message.

The effect of this should be to reduce the possibility of an array
overflow.  Should an array overflow, an explicit error message is
supposed to be printed, so it should not be a mystery.

>> - try removing the print_backtrace() calls
> This makes it run considerable longer. [...]

OK.  This, and ...

>> - check whether any probes were skipped
> Yes, I've seen between 30 and 100 probes skipped.

... this makes it likely that the script global variable contention is
indeed what's responsible.  You might find this wiki article helpful:
http://sources.redhat.com/systemtap/wiki/TipSkippedProbesOptimization


>> - try bumping up buffer size

> How do I do that?  the -s option?  'man stap' didn't state the
> default for -s

Yeah, that oversight should be corrected.  "stap -h" should list it, but
also doesn't.

> what do you suggest I start with?

You could try something on the order of 8 or 16, but I now don't think
it will matter.

- FChE

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-04-11  0:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1207840624.3989.5.camel@dyn9047018139.beaverton.ibm.com>
2008-04-11 19:20 ` [Fwd: Re: stap early exit] Darren Hart
2008-04-11 20:22   ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).