Re: resuming after stop at syscall

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* Re: resuming after stop at syscall_entry
       [not found] <20090418042722.5B584FC35F@magilla.sf.frob.com>
@ 2009-04-22 20:44 ` David Smith
  2009-04-25  0:58   ` Roland McGrath
  0 siblings, 1 reply; 3+ messages in thread
From: David Smith @ 2009-04-22 20:44 UTC (permalink / raw)
  To: Roland McGrath; +Cc: utrace-devel, Systemtap List

Roland McGrath wrote:

This processing makes sense I think.  It is a bit complicated of course,
but not unnecessarily so.

I'd like to ask you how this stuff would relate to systemtap (so I've
added the systemtap mailing list).  I've interspersed a few
comments/questions below.

... stuff deleted ...
> SYSCALL_ENTRY is unlike all other events.  Right after this callback
> loop is when the important user-visible stuff happens (the system call).
> So we stop immediately there as for the other two.  But, if another
> engine used UTRACE_STOP and maybe did something asynchronously, like
> modifying the syscall argument registers, you get no opportunity to see
> what happened.  Once all engines lift UTRACE_STOP, the system call runs.

... stuff deleted ...

> As explained above, the norm of interacting with other engines and their
> use of UTRACE_STOP is to use the final report.  When your callback's
> action argument includes UTRACE_STOP, you know an earlier engine might
> be fiddling before the thread resumes.  So, your callback can decide to
> return UTRACE_REPORT.  That ensures that some report_quiesce (or
> report_signal/UTRACE_SIGNAL_REPORT) callback will be made after the
> other engine lifts its UTRACE_STOP and before user mode.  At that point,
> you can see what user register values it might have installed, etc.  In
> all events but syscall entry, a final report_quiesce(0) serves this need.
> 
> My proposal is to extend this "resume report" approach to the syscall
> entry case.  That is, after when some report_syscall_entry returned
> UTRACE_STOP so we've stopped, allow for a second reporting pass after
> we've been resumed, before running the system call.  You'd get this pass
> if someone used UTRACE_REPORT.  That is, in the first callback loop, one
> engine used UTRACE_STOP and another used UTRACE_REPORT.  Then when the
> first engine used utrace_control() to resume, there would be a second
> reporting pass because of the second engine's earlier request.  Or, even
> if there was just one engine, but it used UTRACE_STOP and then used
> utrace_control(UTRACE_REPORT) to resume, then it would get the second
> reporting pass.  If someone uses UTRACE_STOP+UTRACE_REPORT in that pass,
> there would be a third pass, etc.
> 
> What I have in mind is that the second (and however many more) pass
> would just be another report_syscall_entry callback to everyone with
> UTRACE_EVENT(SYSCALL_ENTRY) set.  A flag bit in the action argument says
> this is a repeat notification.
> 
> I think this strikes a decent balance of not adding more callbacks and
> more arguments to bloat the API in general, while imposing a fairly
> simple burden on engines to avoid getting confused by multiple calls.
> 
> A tracing-only engine that just wants to see the syscall that is going
> to be done can just do:
> 
> 	if (utrace_resume_action(action) == UTRACE_STOP)
> 		return UTRACE_REPORT;
> 
> at the top of report_syscall_entry, so it just doesn't think about it
> until it thinks the call will go now through.  

Systemtap currently doesn't support changing syscall arguments, if it
does, obviously a few things would need to change.

But, I think systemtap would probably fall here - only see the syscall
that is actually going to be done.  So systemtap could possibly get
multiple callbacks for the same syscall, but only pay attention to the
last one, correct?

> Say an engine has a different agenda, just to see what syscall argument
> values came in from user mode before someone else changes them.  It does:
> 
> 	if (action & UTRACE_SYSCALL_RESUMED)
> 		return UTRACE_RESUME;
> 
> to ignore the additional callbacks that might come after somebody
> decided to stop and report.  It just does its work on the first one.
> 
> Here comes Renzo again!  He wants to have two or three or nineteen
> layers of the first kind of Renzo engine: each one stops at syscall
> entry, then resumes after changing some registers.  He wants these to
> "nest", meaning that after the "outermost" one stops, fiddles, and
> resumes, the "next one in" stops, looks at the register as fiddled by
> the outermost guy, fiddles in a different way, and resumes, and on and
> on.  Perhaps the first model (if last guy is stopping, punt to look
> again at resume report) works for that.  Or perhaps the engine also
> needs to keep track with its own state flag it sets whenever it does its
> work, and then resets in exit tracing to prepare for next time.

... stuff deleted ...

> So, even I can't write that much text and still think this interface
> choice is simple to understand.  But I kind of think it's around as
> simple as it can be for its mandates.  I'd appreciate any feedback.

This is understandable, but does hurt my head a *little* bit.  I think
if you put the above full text somewhere and provided some examples this
would make sense to people.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: resuming after stop at syscall_entry
  2009-04-22 20:44 ` resuming after stop at syscall_entry David Smith
@ 2009-04-25  0:58   ` Roland McGrath
  2009-04-28 15:18     ` David Smith
  0 siblings, 1 reply; 3+ messages in thread
From: Roland McGrath @ 2009-04-25  0:58 UTC (permalink / raw)
  To: David Smith; +Cc: utrace-devel, Systemtap List

> This processing makes sense I think.  It is a bit complicated of course,
> but not unnecessarily so.

Glad to hear it!

> > A tracing-only engine that just wants to see the syscall that is going
> > to be done can just do:
> > 
> > 	if (utrace_resume_action(action) == UTRACE_STOP)
> > 		return UTRACE_REPORT;
> > 
> > at the top of report_syscall_entry, so it just doesn't think about it
> > until it thinks the call will go now through.  
> 
> Systemtap currently doesn't support changing syscall arguments, if it
> does, obviously a few things would need to change.
> 
> But, I think systemtap would probably fall here - only see the syscall
> that is actually going to be done.  So systemtap could possibly get
> multiple callbacks for the same syscall, but only pay attention to the
> last one, correct?

Correct.  The advice quoted above is what its callbacks would do to ignore
the callbacks before the last one.

Note that you'll only be sure you're seeing "actually going to be done"
state if yours is the "first" engine attached.  (Thus, by the new special
case calling order, its will be the last report_syscall_entry callback to
run.)  This is just the general "engine priority" thing, not anything new.

In cases like ptrace and kmview (Renzo's thing), even if these engines are
first (i.e. called after yours), you will still be seeing the "final" state
because they did their changes asynchronously before resuming.  But some
other engine might do its changes directly in its own callback instead
(whether it used UTRACE_STOP and got a repeat callback, or just on the
first time through without stopping), so those changes would happen only
after your "last" callback.

In the same vein, "earlier" engines (i.e. here called after yours) might
use UTRACE_STOP after your first callback had every reason to believe it
was the "last" one (i.e. that if did not hit).  In that case, you will get
a repeat call (with UTRACE_SYSCALL_RESUMED flag).  On that call, you need
to cope with the fact that you already did your entry tracing work before
(but now things may have changed).  

If the theory is that you want to respect your place in the engine order,
whatever that is (i.e., if your tracing just reported a lie, it was the lie
you were supposed to believe), then "coping" just means ignoring the
repeat.  (This is no different in kind from an "earlier" engine/later
callback changing the registers after your callback and never stopping.)

For that you need to keep track of whether you already handled it or not.
(Depending on your relative order and the actions of the other engines, you
might get either UTRACE_STOP or UTRACE_SYSCALL_RESUMED either before or
after "you handled it".  So you can't use those alone.)  You can do this in
two ways.  One is to use your own per-thread state (engine->data, etc.).
The other is to disable the SYSCALL_ENTRY event when you've handled it, so
you won't get more callbacks.  Then you can re-enable the event in your
report_syscall_exit callback (or report_quiesce/report_signal, or whatever
is most convenient to be sure you'll run before it goes back to user mode).
i.e., use utrace_set_events() from the callbacks.

> This is understandable, but does hurt my head a *little* bit.  I think
> if you put the above full text somewhere and provided some examples this
> would make sense to people.

The utrace-syscall-resumed branch puts this in the kerneldoc text for
struct utrace_engine_ops (where callback return values and common arguments
are described):

  * When %UTRACE_STOP is used in @report_syscall_entry, then @task
+ * stops before attempting the system call.  In this case, another
+ * @report_syscall_entry callback follows after @task resumes; in a
+ * second or later callback, %UTRACE_SYSCALL_RESUMED is set in the
+ * @action argument to indicate a repeat callback still waiting to
+ * attempt the same system call invocation.  This repeat callback
+ * gives each engine an opportunity to reexamine registers another
+ * engine might have changed while @task was held in %UTRACE_STOP.
+ *
+ * In other cases, the resume action does not take effect until @task
+ * is ready to check for signals and return to user mode.  If there
+ * are more callbacks to be made, the last round of calls determines
+ * the final action.  A @report_quiesce callback with @event zero, or
+ * a @report_signal callback, will always be the last one made before
+ * @task resumes.  Only %UTRACE_STOP is "sticky"--if @engine returned
+ * %UTRACE_STOP then @task stays stopped unless @engine returns
+ * different from a following callback.

I don't know where the longer explanation and/or examples belong.
Perhaps in a new section in utrace.tmpl?  We could start with putting
together some text on the wiki.  Another idea is to add a few example
modules in samples/utrace/.  Those can illustrate things with good
comments, and also could be built verbatim to load multiple
ones/instances in different orders and demonstrate what happens, etc.

It would be nice to have folks like you and Renzo work up this text
and/or examples.  What's needed is stuff that makes sense to you guys
as users of the API, rather than what makes sense to me who has
thought too much already about all this stuff.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: resuming after stop at syscall_entry
  2009-04-25  0:58   ` Roland McGrath
@ 2009-04-28 15:18     ` David Smith
  0 siblings, 0 replies; 3+ messages in thread
From: David Smith @ 2009-04-28 15:18 UTC (permalink / raw)
  To: Roland McGrath; +Cc: utrace-devel, Systemtap List

Roland McGrath wrote:
>> This processing makes sense I think.  It is a bit complicated of course,
>> but not unnecessarily so.
> 
> Glad to hear it!
> 
>>> A tracing-only engine that just wants to see the syscall that is going
>>> to be done can just do:
>>>
>>> 	if (utrace_resume_action(action) == UTRACE_STOP)
>>> 		return UTRACE_REPORT;
>>>
>>> at the top of report_syscall_entry, so it just doesn't think about it
>>> until it thinks the call will go now through.  
>> Systemtap currently doesn't support changing syscall arguments, if it
>> does, obviously a few things would need to change.
>>
>> But, I think systemtap would probably fall here - only see the syscall
>> that is actually going to be done.  So systemtap could possibly get
>> multiple callbacks for the same syscall, but only pay attention to the
>> last one, correct?
> 
> Correct.  The advice quoted above is what its callbacks would do to ignore
> the callbacks before the last one.
> 
> Note that you'll only be sure you're seeing "actually going to be done"
> state if yours is the "first" engine attached.  (Thus, by the new special
> case calling order, its will be the last report_syscall_entry callback to
> run.)  This is just the general "engine priority" thing, not anything new.
> 
> In cases like ptrace and kmview (Renzo's thing), even if these engines are
> first (i.e. called after yours), you will still be seeing the "final" state
> because they did their changes asynchronously before resuming.  But some
> other engine might do its changes directly in its own callback instead
> (whether it used UTRACE_STOP and got a repeat callback, or just on the
> first time through without stopping), so those changes would happen only
> after your "last" callback.
> 
> In the same vein, "earlier" engines (i.e. here called after yours) might
> use UTRACE_STOP after your first callback had every reason to believe it
> was the "last" one (i.e. that if did not hit).  In that case, you will get
> a repeat call (with UTRACE_SYSCALL_RESUMED flag).  On that call, you need
> to cope with the fact that you already did your entry tracing work before
> (but now things may have changed).  
> 
> If the theory is that you want to respect your place in the engine order,
> whatever that is (i.e., if your tracing just reported a lie, it was the lie
> you were supposed to believe), then "coping" just means ignoring the
> repeat.  (This is no different in kind from an "earlier" engine/later
> callback changing the registers after your callback and never stopping.)
> 
> For that you need to keep track of whether you already handled it or not.
> (Depending on your relative order and the actions of the other engines, you
> might get either UTRACE_STOP or UTRACE_SYSCALL_RESUMED either before or
> after "you handled it".  So you can't use those alone.)  You can do this in
> two ways.  One is to use your own per-thread state (engine->data, etc.).
> The other is to disable the SYSCALL_ENTRY event when you've handled it, so
> you won't get more callbacks.  Then you can re-enable the event in your
> report_syscall_exit callback (or report_quiesce/report_signal, or whatever
> is most convenient to be sure you'll run before it goes back to user mode).
> i.e., use utrace_set_events() from the callbacks.

It sounds like disabling SYSCALL_ENTRY then re-enabling it in the
report_syscall_exit() callback is a reasonable way to go.

>> This is understandable, but does hurt my head a *little* bit.  I think
>> if you put the above full text somewhere and provided some examples this
>> would make sense to people.
> 
> The utrace-syscall-resumed branch puts this in the kerneldoc text for
> struct utrace_engine_ops (where callback return values and common arguments
> are described):
> 
>   * When %UTRACE_STOP is used in @report_syscall_entry, then @task
> + * stops before attempting the system call.  In this case, another
> + * @report_syscall_entry callback follows after @task resumes; in a
> + * second or later callback, %UTRACE_SYSCALL_RESUMED is set in the
> + * @action argument to indicate a repeat callback still waiting to
> + * attempt the same system call invocation.  This repeat callback
> + * gives each engine an opportunity to reexamine registers another
> + * engine might have changed while @task was held in %UTRACE_STOP.
> + *
> + * In other cases, the resume action does not take effect until @task
> + * is ready to check for signals and return to user mode.  If there
> + * are more callbacks to be made, the last round of calls determines
> + * the final action.  A @report_quiesce callback with @event zero, or
> + * a @report_signal callback, will always be the last one made before
> + * @task resumes.  Only %UTRACE_STOP is "sticky"--if @engine returned
> + * %UTRACE_STOP then @task stays stopped unless @engine returns
> + * different from a following callback.
> 
> I don't know where the longer explanation and/or examples belong.
> Perhaps in a new section in utrace.tmpl?  We could start with putting
> together some text on the wiki.  Another idea is to add a few example
> modules in samples/utrace/.  Those can illustrate things with good
> comments, and also could be built verbatim to load multiple
> ones/instances in different orders and demonstrate what happens, etc.

The wiki would be fine - just somewhere that people could see this stuff.

> It would be nice to have folks like you and Renzo work up this text
> and/or examples.  What's needed is stuff that makes sense to you guys
> as users of the API, rather than what makes sense to me who has
> thought too much already about all this stuff.

We should probably just dump your email into the wiki.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-04-28 15:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20090418042722.5B584FC35F@magilla.sf.frob.com>
2009-04-22 20:44 ` resuming after stop at syscall_entry David Smith
2009-04-25  0:58   ` Roland McGrath
2009-04-28 15:18     ` David Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).