public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* Re: Using reverse execution
@ 2005-09-20 22:47 Michael Snyder
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Snyder @ 2005-09-20 22:47 UTC (permalink / raw)
  To: gdb; +Cc: shebs


 >> That's not what you do when you trace a bug.  You start from
 >> the place where, e.g., the program gets a SIGSEGV, and then
 >> unroll it back to possible places where the corruption could
 >> have happened.  That is, you try to guess where the problem
 >> could have originated from, and then get there and look around
 >> for clues.  I don't find this jarring in any way.
 >
 > But have you actually done any debugging by reverse execution
 > yourself?

I have.  I've been using it to debug real bugs, difficult ones,
in a realtime embedded OS.  I've got a prototype gdb working
with the Simics simulator, with all of the reverse-* commands
pretty much working: reverse-continue, step, stepi, next,
nexti, and finish.  Breakpoints and watchpoints also work
in reverse.

I'll give you my best example, which follows a scenario
that Stan outlined near the beginning of this thread.

I've got multiple threads, and one of them is blowing its
stack.  Unfortunately it doesn't cause an immediate problem --
it isn't detected until the scheduler does a sanity check at
the next task switch point, and discovers that the guard word
at the end of the stack is gone.  At that point, it panics.
This is essentially like seg faulting when you write thru a
bad pointer -- you need to know who wrote the bad value to
the  pointer, and that will be the LAST person who changed
it.  Many people may have changed it before then.

But -- all I had to do was run forward until the stack
corruption was detected (by analogy, to the segfault),
and then put a watchpoint on the clobbered memory
location and run backward.  Bingo -- the first time
the watchpoint triggers, I have my culprit.

Michael Snyder
(still at Red Hat, don't be confused by the email address)

^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: Using reverse execution
@ 2005-09-20 23:11 Michael Snyder
  2005-09-24  0:07 ` Stan Shebs
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Snyder @ 2005-09-20 23:11 UTC (permalink / raw)
  To: gdb; +Cc: shebs

 > Depending on the answers, the project could be fatally flawed.
 > For instance, if the ability to undo system calls is critical
 > for usability, that pretty much relegates reversal to simulator
 > targets only - not interesting for my user base. That's why I
 > wanted to talk about usage patterns; if users don't need the
 > debugger to do the incredibly hard things, then we can get to
 > something useful sooner.

Here's the thing, though, Stan --

We can separate the debugger implementation questions from
the target-side implementation questions.  Whether I/O can
be "undone", whether system calls can be reversed over, even
whether the target can proceed forward again from a point
that it has reversed back to -- these are all things about
which gdb need not concern itself.  They're target-side
details.

Think about forward execution.  Does gdb know anything
about system calls?  In general, not.  Does it know anything
about I/O?  Definitely not, except in some special cases.
GDB knows about step, continue, and why-did-we-stop?
Those are its primatives.

If we make the CORE PART of gdb do nothing more than use
similar primatives for backward debugging, then it will
"just work".  I know this, 'cause I've done it.  We may
need to build some more intimate details into SOME gdb
back-ends, or implement a separate module that can do
certain things such as checkpoints for a target that
can't do them for itself -- but the core part of gdb
doesn't need to know about that, and those considerations
need not hold up the development of reverse execution
in the core part of gdb.

Separate the debugging of reverse-execution from the
question of how the reverse-execution is to be done.
I know, you need to consider both, and there's definitely
cross-over, but what I am saying is that we CAN
separate them, and that gdb will be better if we do.
The part of gdb that controls execution (infrun and
infcmd, for instance) SHOULD not know how the backend
or the target "works".

The target, on the other hand, may have lots of
capabilities, and it may not.  Maybe it can only
"back up" until the first system call, and then
it gives up.  Well, then gdb just needs to know
how to handle a target that can do some reverse
executing, but then can't do more.  That's general --
because another target may have a "buffer" of saved
state for reverse execution, and it may eventually
reach the beginning of that buffer.  Infrun doesn't
necessarily need to know WHY the target can't go
backward any more, just that it can't.  Although
of course we might encode some common reasons and
give some meaningful failure message, it isn't
essential to the implementation of reverse debugging.


^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: Using reverse execution
@ 2005-09-20 22:56 Michael Snyder
  2005-09-20 23:14 ` Ian Lance Taylor
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Michael Snyder @ 2005-09-20 22:56 UTC (permalink / raw)
  To: gdb; +Cc: shebs

 >>> As a comparison, for tracepoints we came up with various
 >>> scenarios for how they would be amazingly useful and powerful,
 >>> and yet after nearly a decade they remain a curiosity in GDB.
 >>
 >> IMHO, tracepoints remain a curiosity because they were never
 >> implemented on a large enough number of platforms.  Lack of
 >> native support, in particular, is the main reason for its non-use.
 >
 > But don't you think it's telling that not one single person
 > was willing to go to the trouble of implementing it on more
 > platforms?  When breakpoints don't work on a platform, users
 > don't say "oh well, we'll just have to do without". Apparently
 > tracepoints are just not a must-have.

Eli remarked that the usefulness of reverse execution was a
no-brainer for him, and it's obviously a no-brainer for you
and me and a number of other GDB maintainers.

And yet -- I have a target audience of engineers to whom
I've been trying to "sell" reverse execution -- and I have
a working implementation that I can demo, live, and a real-life
bug that I can show to be easy to debug with reverse execution,
and pretty damn hard otherwise.  And the majority of them will
go "wow", but they aren't jumping up and down demanding access
to this cool facility.

I think this is a familiar concept to us, but an unfamiliar
one for many users, and they may have to get their hands on
it and actually use it and play with it before they start to
get a feel for its true power.

The same may have been true for tracepoints.  There were some
people who went "wow", and even a few who took a stab at doing
a target implementation -- but few people ever actually got to
get their hands on it and play with it.  Even a live demo is
not always as convincing as that.

^ permalink raw reply	[flat|nested] 38+ messages in thread
* Using reverse execution
@ 2005-09-13  1:17 Stan Shebs
  2005-09-13  3:43 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Stan Shebs @ 2005-09-13  1:17 UTC (permalink / raw)
  To: gdb

Hi all, I've been spending some time recently on the reverse execution
idea that's been getting some airplay since May or so
(http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and
subsequent), and would like to bring people up to date on some of
my thinking, plus solicit ideas.

The context is Darwin aka Mac OS X native debugging (surprise), and
the idea is to make it something that any application developer using
the Xcode environment could use. There are of course a hundred
practical difficulties (how do you un-execute an IP packet send?  how
do you reverse a billion instructions executed on multi-GHz
hardware?), so this is more of a research project at this point; a
real-life usable implementation probably entails extensive kernel
hacking, but right now we don't know enough even to tell kernel people
what facilities we want them to add. In this message, I'm going to
focus on the user model, without trying to tie things down to a
specific implementation.

So my big question is: what is reverse execution good for? Thinking
about some of the difficulties I allude to above, it's easy to
conclude that maybe reverse execution is just "party tricks" - an
impressive demo perhaps, but not a feature that real-life users would
ever adopt.  Since the May discussion, I've been watching myself while
debugging other things (like a GDB merge :-) ), and asking "if I had
reverse execution, when would I use it?".

The thing that jumped out at me most strongly was reverse execution as
"undo" facility.

For instance, when stepping through unfamiliar and complicated code,
it's very common to "next" over an apparently innocuous function, then
say "oh sh*t" - your data has magically changed for the worse and so
the function you nexted over must be the guilty party. But it's often
too late - you've passed by the key bit of wrong code, and need to
re-run. Much of the time this is OK, and only takes a moment; but if
your application is complicated (like an iTunes), or if you have a
complicated sequence of breaks and continues and user input to get to
the point of interest, re-running starts to get slow and
errorprone. You may also have a situation where the bug is
intermittent and not guaranteed to appear on rerunning (if that sounds
familiar, hold the thought for a moment). So in these kinds of cases,
what you really want to undo that last "next" so you can do a "step"
and descend into the function of unexpected interest.

A similar case might occur with single-stepping through a series of
calculations. I suspect everybody has at one time or another stepped
over something like "a *= b;", printed the value of a only to get a
bogus value, then either mentally calculated a / b or typed in "p a /
b", as a quick way to recover the last value of a.  It would have been
easier and faster just to back up by one line and print (or watch)
a. If the calculation is complicated, the manual un-calculate exposes
the user to blind alleys if the calcution was mistaken. For instance,
if you try to manually undo some pointer arithmetic, you might
mentally adjust by chars when you should be adjusting by ints, and
then be misled because you think that the bug is that the program is
writing bad stuff into memory, when it's the pointer value that's
mistaken.

The key tradeoff for reverse execution as undo facility is complexity
of rerunning. If rerunning is a cheap part of the debugging session,
then the undo facility is not going to seem that important.

Another use for reverse execution is a more general form of zeroing in
on a bug. Suppose you have a bogus pointer that was supposedly
returned by malloc() somewhere earlier in the program's
execution. That pointer may only sit in a named variable, and the rest
of time is wandering around in various other data structures. There's
no single thing to watch in this case; it's not the memory being
pointed to that's the problem, it's that the pointer itself goes "off
the reservation". So what you want to do is start from the point at
which you've discovered the pointer is bad (segfault perhaps), watch
the location holding the bad pointer, and run backwards until either
a) somebody modifies the pointer in place, or b) the bogus pointer is
copied from elsewhere, in which case you watch it and continue
backwards. In many cases you'll get to the bad code sooner than by
running forwards, where you have to anticipate which malloc will
produce the later-to-be-scrambled pointer, and generally trace along
trying to anticipate who's going to do what to it before the bad thing
happens. (The display of ignored breakpoint hits was introduced as a
way to speed this up some.) Again, as with undo, the efficiency of
this process vs re-running depends on whether the actual bug occurs
closer to the beginning of execution, or closer to the point of
failure. One could make an argument that most root-cause bugs tend to
occur closer to failure points than to the beginning of program
execution, but that's kind of a philosphical point about program
desing for which I have no concrete evidence.

Then there is stepping backwards by instructions to retrace what is
happening at the machine level. I'm less inclined to say this is
valuable; picking apart registers and raw memory is a rather
painstaking activity, so slow (at the human level), that the time to
re-run up to the line in question is usually negligible by
comparison. Even so, I can see it becoming very natural for a user to
do a step, see bogus data that simply can't be explained by the source
line on the screen, do a reverse-step and then multiple stepi's to
"slo-mo" the calculations of that line's compiled code.

I touched on hard-to-repeat cases briefly above - GDB mavens will
recognize this as one of the rationales for the tracepoint facility.
Reverse execution is similar in that once you've gotten the program
into a state where a problem manifests, you want to poke around in the
program's immediate past states. Tracepoints however are designed such
that the user needs to anticipate what data will be interesting;
sensible in a decoupled remote debugging context, but not so good for
the data-driven spur-of-the-moment experimentation that is part of a
productive debugging session. So a working reverse execution gives the
user freedom to look around a program's entire state while moving up
and down along the flow of execution. (Ironically, this capability
might work against good program design, in that it takes away some
incentive to design a program with repeatable behavior. For instance,
programs using random number generator often include machinery to
display and input RNG seeds, one of the uses being to guarantee
predictability while re-running under a debugger.)

But will users actually use any of this in real life? "Undo" is pretty
easy - everybody understands "undo", even nonprogrammers, with many
GUIs giving it a dedicated keystroke. Tracking data backwards through
a program is a powerful tool for a tough class of bugs, but as we know
from past experience, powerful features that are at all hard to use
are often ignored. Single-instruction reverse stepping is conceptually
simpler, but likely to see more interest from the low-level
developers, and may only be interesting if available for kernel
debugging and the like. Reproducibility problems crop up regularly, so
I can see people wanting to use reverse execution after a breakpoint
sets them down in rarely-executed code.

Once we have an idea of what we think users will want from the
feature, we'll have a better idea of what characteristics and
limitations might be acceptable in an implementation.

Stan



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2005-09-27 22:00 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-20 22:47 Using reverse execution Michael Snyder
  -- strict thread matches above, loose matches on Subject: below --
2005-09-20 23:11 Michael Snyder
2005-09-24  0:07 ` Stan Shebs
2005-09-20 22:56 Michael Snyder
2005-09-20 23:14 ` Ian Lance Taylor
2005-09-21  3:40   ` Eli Zaretskii
2005-09-21  4:00     ` Ian Lance Taylor
2005-09-21 17:52       ` Eli Zaretskii
2005-09-21 20:37       ` Michael Snyder
2005-09-24  0:46         ` Stan Shebs
2005-09-24  1:10           ` Michael Snyder
2005-09-24 10:05           ` Eli Zaretskii
2005-09-27 22:00           ` Jim Blandy
2005-09-21  4:03     ` Daniel Jacobowitz
2005-09-21 16:56 ` Paul Gilliam
2005-09-23 23:44 ` Stan Shebs
2005-09-13  1:17 Stan Shebs
2005-09-13  3:43 ` Eli Zaretskii
2005-09-14  0:36   ` Stan Shebs
2005-09-14  3:42     ` Eli Zaretskii
2005-09-14 22:34       ` Stan Shebs
2005-09-15  3:37         ` Eli Zaretskii
2005-09-15  5:36           ` Stan Shebs
2005-09-15 15:14             ` Eli Zaretskii
2005-09-15 18:02               ` Jason Molenda
2005-09-15 20:12                 ` Stan Shebs
2005-09-16 10:42                   ` Eli Zaretskii
2005-09-16 14:00                     ` Stan Shebs
2005-09-16 16:22                       ` Eli Zaretskii
2005-09-16 18:03                         ` Stan Shebs
2005-09-16 20:50                           ` Eli Zaretskii
2005-09-23 23:20                             ` Stan Shebs
2005-09-16 17:50                       ` Ian Lance Taylor
2005-09-16 10:43                 ` Eli Zaretskii
2005-09-13 18:11 ` Min Xu (Hsu)
2005-09-13 22:01   ` Jim Blandy
2005-09-14  0:42     ` Stan Shebs
2005-09-16 12:03 ` Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).