public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* how to handle userspace string copy failures
@ 2006-05-25 18:11 Martin Hunt
  2006-05-25 19:22 ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Hunt @ 2006-05-25 18:11 UTC (permalink / raw)
  To: systemtap

With the recent change in page fault handling, we are seeing more
failures from user_string(). Unfortunately, that results in an error
being logged and the script terminating. This is partly my fault because
I fixed this once before and then changed it back when asked to,
forgetting why this was bad. So, for the record, we cannot guarantee
always being able to always access userspace and such failures should
not terminate the script. At worst, I think we should print warnings. I
also propose that any user_string() request that fails should return
"<unknown>".

If there are no objections, I will check in this tapset change.

Martin


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to handle userspace string copy failures
  2006-05-25 18:11 how to handle userspace string copy failures Martin Hunt
@ 2006-05-25 19:22 ` Frank Ch. Eigler
  2006-05-25 21:05   ` Martin Hunt
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2006-05-25 19:22 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap


hunt wrote:

> [...]  So, for the record, we cannot guarantee always being able to
> always access userspace 

We need to investigate to what extent this problem can be worked
around by clever other ways.  For example, can we arrange to
preemptively fault in more parts of programs when systemtap probes are
running?

> and such failures should not terminate the script.

See the MAXERRORS parameter.

> At worst, I think we should print warnings.  I also propose that any
> user_string() request that fails should return "<unknown>".

I am uncomfortable with hard-coding such a decorated english term.  A
simple blank string would be fine.

I would be happier if the decision for treatment as a soft vs. hard
error were left up to the caller script.  One way to do this would be
to fork user_string() into two variants, one of which signals the
current sort of error-level fault (as does kernel_string()), and one
that just returns a sentinel soft-error value.  Hey, that sentinel
value could even be passed to it as an additional argument.

- FChE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to handle userspace string copy failures
  2006-05-25 19:22 ` Frank Ch. Eigler
@ 2006-05-25 21:05   ` Martin Hunt
  2006-05-25 21:20     ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Hunt @ 2006-05-25 21:05 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Thu, 2006-05-25 at 15:21 -0400, Frank Ch. Eigler wrote: 
> hunt wrote:
> 
> > [...]  So, for the record, we cannot guarantee always being able to
> > always access userspace 
> 
> We need to investigate to what extent this problem can be worked
> around by clever other ways.  For example, can we arrange to
> preemptively fault in more parts of programs when systemtap probes are
> running?

We should do that, but it doesn't solve the short-term problem and it
will not solve the problem for the long-term unless we find a way to
always fulfill userspace copies,

> > and such failures should not terminate the script.
> 
> See the MAXERRORS parameter.

That would be for errors, which I do not consider this. At least it
shouldn't be confused with real errors, like when an array overflows and
no more data can be stored.

> > At worst, I think we should print warnings.  I also propose that any
> > user_string() request that fails should return "<unknown>".
> 
> I am uncomfortable with hard-coding such a decorated english term.  A
> simple blank string would be fine.

blank strings do nothing to indicate that information was missing.

> I would be happier if the decision for treatment as a soft vs. hard
> error were left up to the caller script.  One way to do this would be
> to fork user_string() into two variants, one of which signals the
> current sort of error-level fault (as does kernel_string()), and one
> that just returns a sentinel soft-error value.  Hey, that sentinel
> value could even be passed to it as an additional argument.

What do you mean by "sentinel soft-error value". How would this work?

I think we need some more thought given to error handling. userspace
copy fails? increment error counter. map overflows? increment error
counter. When the counter hits a user-defined threshold, then terminate.
Any script doing userspace copies (which is almost all of them) needs to
set MAXERRORS so it won't stop the first time a userspace copy doesn't
complete, but doing so ignores serious errors. Also MAXERRORS is a
define so changing it means a recompile, which is a problem for
cross-compiled scripts.

Instead maybe we need to classify errors into groups, set default
behavior for the groups and allow the user to set new defaults.

Or we can just provide two variants of each function that might cause a
error. We need to support simple scripts that don't care and shouldn't
have to be bothered to handle expected, but rare, problems like
userspace copy errors. We also need to support more sophisticated
scripts that might need to detect when that happens and handle it.






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to handle userspace string copy failures
  2006-05-25 21:05   ` Martin Hunt
@ 2006-05-25 21:20     ` Frank Ch. Eigler
  2006-05-25 22:42       ` Tim Bird
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2006-05-25 21:20 UTC (permalink / raw)
  To: Martin Hunt; +Cc: systemtap

Hi -

hunt wrote:

> [...]
> > We need to investigate to what extent this problem can be worked
> > around by clever other ways.  For example, can we arrange to
> > preemptively fault in more parts of programs when systemtap probes are
> > running?
> 
> We should do that, but it doesn't solve the short-term problem and it
> will not solve the problem for the long-term unless we find a way to
> always fulfill userspace copies,

"The problem" is not that some userspace pages will be inaccessible.
It is that so many interesting ones seem to be suddenly inaccessible
right now.


> > > and such failures should not terminate the script.
> > 
> > See the MAXERRORS parameter.
> 
> That would be for errors, which I do not consider this. At least it
> shouldn't be confused with real errors, like when an array overflows and
> no more data can be stored.

We do not have many kinds of "unusual condition" indications.  We have
soft errors, which are quiet and don't interrupt control flow, and
hard errors, which are noisy and do interrupt control flow.  I believe
there is no third category at the moment.

One could argue that array overflow could be turned into a soft error,
analogously to a string value that is too long and is quietly
truncated.  Whether it should be one or the other is a matter of
judgement: how much each kind of failure matters.


> > > At worst, I think we should print warnings.  I also propose that any
> > > user_string() request that fails should return "<unknown>".
> > 
> > I am uncomfortable with hard-coding such a decorated english term.  A
> > simple blank string would be fine.
> 
> blank strings do nothing to indicate that information was missing.

Nor does "<unknown>", except to an english-speaking user looking over
a literally transcribed output after the fact.  That's the point.


> [...]  What do you mean by "sentinel soft-error value". 

The "sentinel value" term is from baby computer science - a special
value intermingled into a data stream to identify an unusual
condition, like "9999" to end a list of input numbers.  A "soft-error
value" is a value that results from a soft error.  (*some* legal value
must result, since control flow is uninterrupted, and thus a value
must be propagated into the expressions.)

> How would this work? [...]

We retain user_string(addr) just as it is now: a hard error if it
faults for whatever reason.  We add a new function
user_string_mayfail(addr,str), which quietly and softly returns the
str argument if the access faulted.  Your syscall tapset routines
would presumably use the second variant. (For bonus points, we support
overloading user_string() with one vs. two parameters.)


- FChE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how to handle userspace string copy failures
  2006-05-25 21:20     ` Frank Ch. Eigler
@ 2006-05-25 22:42       ` Tim Bird
  0 siblings, 0 replies; 5+ messages in thread
From: Tim Bird @ 2006-05-25 22:42 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Martin Hunt, systemtap

Frank Ch. Eigler wrote:
>>>> At worst, I think we should print warnings.  I also propose that any
>>>> user_string() request that fails should return "<unknown>".
>>> I am uncomfortable with hard-coding such a decorated english term.  A
>>> simple blank string would be fine.
>> blank strings do nothing to indicate that information was missing.
> 
> Nor does "<unknown>", except to an english-speaking user looking over
> a literally transcribed output after the fact.  That's the point.

The string "<unknown>" has much more information content,
even for a non-english speaker, than "".  At least you
can google it.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-05-25 22:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-25 18:11 how to handle userspace string copy failures Martin Hunt
2006-05-25 19:22 ` Frank Ch. Eigler
2006-05-25 21:05   ` Martin Hunt
2006-05-25 21:20     ` Frank Ch. Eigler
2006-05-25 22:42       ` Tim Bird

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).