how to discover description of error which terminated run

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* how to discover description of error which terminated run
@ 2012-12-05 17:02 John Lumby
  2012-12-05 21:55 ` Mark Wielaard
  0 siblings, 1 reply; 7+ messages in thread
From: John Lumby @ 2012-12-05 17:02 UTC (permalink / raw)
  To: systemtap

Recently my system tap runs have been terminating abruptly during a run,
(long after the start of the subject workload so not initialisation-related,
 but also long before the workload itself ends)
and printing something like this at the end:

WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/local/bin/staprun exited with status: 1
Pass 5: run completed in 20usr/30sys/55140real ms.
Pass 5: run failed.  Try again with another '--vp 00001' option.
Keeping temporary directory "/tmp/stapOlPtXe"

Prior to this it has successfully and correctly printed all the output from all probes
hit till then (they are all user probes by the way  -  I am not using the kprobe)

It has the look of hitting some resource limit but I can't be sure.
How can I get stap to tell me the description of the error which occurred?
(and also of course make it continue )

This is system tap pulled from git yesterday (4 December 2012)
compiled against elfutils-0.152 ,   running under a linux amd-64 3.6.6

John Lumby

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how to discover description of error which terminated run
  2012-12-05 17:02 how to discover description of error which terminated run John Lumby
@ 2012-12-05 21:55 ` Mark Wielaard
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Wielaard @ 2012-12-05 21:55 UTC (permalink / raw)
  To: John Lumby; +Cc: systemtap

On Wed, Dec 05, 2012 at 12:02:32PM -0500, John Lumby wrote:
> 
> Recently my system tap runs have been terminating abruptly during a run,
> (long after the start of the subject workload so not initialisation-related,
>  but also long before the workload itself ends)
> and printing something like this at the end:
> 
> WARNING: Number of errors: 1, skipped probes: 0
> WARNING: /usr/local/bin/staprun exited with status: 1
> Pass 5: run completed in 20usr/30sys/55140real ms.
> Pass 5: run failed.  Try again with another '--vp 00001' option.
> Keeping temporary directory "/tmp/stapOlPtXe"
> 
> Prior to this it has successfully and correctly printed all the output from all probes
> hit till then (they are all user probes by the way  -  I am not using the kprobe)
> 
> It has the look of hitting some resource limit but I can't be sure.
> How can I get stap to tell me the description of the error which occurred?
> (and also of course make it continue )

It is strange there is no actualy ERROR reported.
Maybe look at dmesg or /var/log/messages output to see if there is
a hint there?
If there was an actual error message, then the following page has
a couple of hints how to get past some resource limit:
http://sourceware.org/systemtap/wiki/TipExhaustedResourceErrors

Cheers,

Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how to discover description of error which terminated run
  2012-12-10 14:53       ` johnlumby
@ 2012-12-10 15:22         ` Frank Ch. Eigler
  0 siblings, 0 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2012-12-10 15:22 UTC (permalink / raw)
  To: johnlumby; +Cc: Mark Wielaard, systemtap maillist

Hi -

> >>I believe errors/warnings already go to stderr as opposed to stdout.
> >>I opened PR14927 to make them even more visible.  (Want to give coding
> >>that up a try?  We'll help!)
> 
> I read the Bug Report.    Looks interesting.    I can certainly try.
> Just point me to where the error messages are written.

In the context of staprun/stapio (pass-5 error messages coming from
the running script), they are pretty localised.  

One way would be to change SRC/staprun/common.c function eprintf().
Around we could emit a \033[31m ... \033[m wrapper around the vfprintf(stderr),
but conditional on isatty(stdout-filehandle), and/or a runtime configuration
item such as a SYSTEMTAP_NOCOLOR environment variable.

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how to discover description of error which terminated run
  2012-12-07 10:06     ` Mark Wielaard
@ 2012-12-10 14:53       ` johnlumby
  2012-12-10 15:22         ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: johnlumby @ 2012-12-10 14:53 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Frank Ch. Eigler, systemtap maillist

On 12/07/12 05:06, Mark Wielaard wrote:
> On Thu, 2012-12-06 at 18:29 -0500, Frank Ch. Eigler wrote:
>> johnlumby wrote:
>>
>>> [...]  What surprises me is that the run continued for a full 0.1
>>> seconds (over 4000 lines of output) after this message was logged
>>> before it shut itself down.  [...]
>> The intent is not to lose information unnecessarily; the transport of
>> that queued information is lightweight enough not to cause overload.
> That and we prioritize reporting warning/error messages. When there are
> messages from the kernel to user space then warning/errors get reported
> first, even when some other messages might still be pending to make sure
> errors are always presented to the user. Which in this case made it
> harder for the user to spot the error, because we were also able deliver
> all the normal messages after reporting the error... :{
>
>>> [...] Makes me wonder  -  is there some way of directing error
>>> messages such as this to some other channel rather than interspersing
>>> them into the run output.   Or else at least making them more distinctive.

I checked how  I was running stap and now I see I was re-directing 
stderr to stdout !
So it was my fault I missed it.      I will switch to using tee for 
stderr in future
and that should solve it.

>> I believe errors/warnings already go to stderr as opposed to stdout.
>> I opened PR14927 to make them even more visible.  (Want to give coding
>> that up a try?  We'll help!)

I read the Bug Report.    Looks interesting.    I can certainly try.
Just point me to where the error messages are written.

> O, color coded error/warnings! Nice idea.
>
> Thanks,
>
> Mark
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how to discover description of error which terminated run
  2012-12-07  0:28   ` Frank Ch. Eigler
@ 2012-12-07 10:06     ` Mark Wielaard
  2012-12-10 14:53       ` johnlumby
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Wielaard @ 2012-12-07 10:06 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: John Lumby, systemtap maillist

On Thu, 2012-12-06 at 18:29 -0500, Frank Ch. Eigler wrote:
> johnlumby wrote:
> 
> > [...]  What surprises me is that the run continued for a full 0.1
> > seconds (over 4000 lines of output) after this message was logged
> > before it shut itself down.  [...]
> 
> The intent is not to lose information unnecessarily; the transport of
> that queued information is lightweight enough not to cause overload.

That and we prioritize reporting warning/error messages. When there are
messages from the kernel to user space then warning/errors get reported
first, even when some other messages might still be pending to make sure
errors are always presented to the user. Which in this case made it
harder for the user to spot the error, because we were also able deliver
all the normal messages after reporting the error... :{

> >[...] Makes me wonder  -  is there some way of directing error
> > messages such as this to some other channel rather than interspersing
> > them into the run output.   Or else at least making them more distinctive.
> 
> I believe errors/warnings already go to stderr as opposed to stdout.
> I opened PR14927 to make them even more visible.  (Want to give coding
> that up a try?  We'll help!)

O, color coded error/warnings! Nice idea.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how to discover description of error which terminated run
  2012-12-06 15:03 ` John Lumby
@ 2012-12-07  0:28   ` Frank Ch. Eigler
  2012-12-07 10:06     ` Mark Wielaard
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2012-12-07  0:28 UTC (permalink / raw)
  To: John Lumby; +Cc: systemtap maillist, mark


johnlumby wrote:

> [...]  What surprises me is that the run continued for a full 0.1
> seconds (over 4000 lines of output) after this message was logged
> before it shut itself down.  [...]

The intent is not to lose information unnecessarily; the transport of
that queued information is lightweight enough not to cause overload.

>[...] Makes me wonder  -  is there some way of directing error
> messages such as this to some other channel rather than interspersing
> them into the run output.   Or else at least making them more distinctive.

I believe errors/warnings already go to stderr as opposed to stdout.
I opened PR14927 to make them even more visible.  (Want to give coding
that up a try?  We'll help!)


- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how to discover description of error which terminated run
       [not found] <1354805036.18914.ezmlm@sourceware.org>
@ 2012-12-06 15:03 ` John Lumby
  2012-12-07  0:28   ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: John Lumby @ 2012-12-06 15:03 UTC (permalink / raw)
  To: systemtap maillist; +Cc: mark


Mark Wielaard wrote:
> 
> > It has the look of hitting some resource limit but I can't be sure.
> > How can I get stap to tell me the description of the error which occurred?
> > (and also of course make it continue )
> 
> It is strange there is no actualy ERROR reported.
> Maybe look at dmesg or /var/log/messages output to see if there is
> a hint there?
> If there was an actual error message, then the following page has
> a couple of hints how to get past some resource limit:
> http://sourceware.org/systemtap/wiki/TipExhaustedResourceErrors
> 
> Cheers,
> 
> Mark
> 

Thanks Mark  -
your comments made me wonder if in fact there actually was some other error message
that was written much earlier  --   and there was!     It took some finding but after some
lengthy grep -v'ing I found this

ERROR: probe overhead exceeded threshold

And this is indeed described in the document you referenced.

What surprises me is that the run continued for a full 0.1 seconds (over 4000 lines
of output) after this message was logged before it shut itself down. 
(And it was hard to find because many of my probes emit a line with words
such as error and failed in them).

Makes me wonder  -  is there some way of directing error messages such as this to
some other channel rather than interspersing them into the run output.   Or else
at least making them more distinctive.

But thanks again  -  problem solved

John
 		 	   		  

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-12-10 15:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-05 17:02 how to discover description of error which terminated run John Lumby
2012-12-05 21:55 ` Mark Wielaard
     [not found] <1354805036.18914.ezmlm@sourceware.org>
2012-12-06 15:03 ` John Lumby
2012-12-07  0:28   ` Frank Ch. Eigler
2012-12-07 10:06     ` Mark Wielaard
2012-12-10 14:53       ` johnlumby
2012-12-10 15:22         ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).