From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-22525-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 23900 invoked by alias); 13 Sep 2005 01:17:39 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 23879 invoked by uid 22791); 13 Sep 2005 01:17:32 -0000
Received: from mail-out4.apple.com (HELO mail-out4.apple.com) (17.254.13.23)
    by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Tue, 13 Sep 2005 01:17:32 +0000
Received: from relay5.apple.com (a17-128-113-35.apple.com [17.128.113.35])
	by mail-out4.apple.com (8.12.11/8.12.11) with ESMTP id j8D1HU11023332
	for <gdb@sources.redhat.com>; Mon, 12 Sep 2005 18:17:30 -0700 (PDT)
Received: from [17.219.207.43] (unknown [17.219.207.43])
	by relay5.apple.com (Apple SCV relay) with ESMTP id E0C8A324014
	for <gdb@sources.redhat.com>; Mon, 12 Sep 2005 18:17:29 -0700 (PDT)
Message-ID: <432628AA.2040808@apple.com>
Date: Tue, 13 Sep 2005 01:17:00 -0000
From: Stan Shebs <shebs@apple.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.3) Gecko/20040910
MIME-Version: 1.0
To: gdb@sources.redhat.com
Subject: Using reverse execution
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2005-09/txt/msg00080.txt.bz2

Hi all, I've been spending some time recently on the reverse execution
idea that's been getting some airplay since May or so
(http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and
subsequent), and would like to bring people up to date on some of
my thinking, plus solicit ideas.

The context is Darwin aka Mac OS X native debugging (surprise), and
the idea is to make it something that any application developer using
the Xcode environment could use. There are of course a hundred
practical difficulties (how do you un-execute an IP packet send?  how
do you reverse a billion instructions executed on multi-GHz
hardware?), so this is more of a research project at this point; a
real-life usable implementation probably entails extensive kernel
hacking, but right now we don't know enough even to tell kernel people
what facilities we want them to add. In this message, I'm going to
focus on the user model, without trying to tie things down to a
specific implementation.

So my big question is: what is reverse execution good for? Thinking
about some of the difficulties I allude to above, it's easy to
conclude that maybe reverse execution is just "party tricks" - an
impressive demo perhaps, but not a feature that real-life users would
ever adopt.  Since the May discussion, I've been watching myself while
debugging other things (like a GDB merge :-) ), and asking "if I had
reverse execution, when would I use it?".

The thing that jumped out at me most strongly was reverse execution as
"undo" facility.

For instance, when stepping through unfamiliar and complicated code,
it's very common to "next" over an apparently innocuous function, then
say "oh sh*t" - your data has magically changed for the worse and so
the function you nexted over must be the guilty party. But it's often
too late - you've passed by the key bit of wrong code, and need to
re-run. Much of the time this is OK, and only takes a moment; but if
your application is complicated (like an iTunes), or if you have a
complicated sequence of breaks and continues and user input to get to
the point of interest, re-running starts to get slow and
errorprone. You may also have a situation where the bug is
intermittent and not guaranteed to appear on rerunning (if that sounds
familiar, hold the thought for a moment). So in these kinds of cases,
what you really want to undo that last "next" so you can do a "step"
and descend into the function of unexpected interest.

A similar case might occur with single-stepping through a series of
calculations. I suspect everybody has at one time or another stepped
over something like "a *= b;", printed the value of a only to get a
bogus value, then either mentally calculated a / b or typed in "p a /
b", as a quick way to recover the last value of a.  It would have been
easier and faster just to back up by one line and print (or watch)
a. If the calculation is complicated, the manual un-calculate exposes
the user to blind alleys if the calcution was mistaken. For instance,
if you try to manually undo some pointer arithmetic, you might
mentally adjust by chars when you should be adjusting by ints, and
then be misled because you think that the bug is that the program is
writing bad stuff into memory, when it's the pointer value that's
mistaken.

The key tradeoff for reverse execution as undo facility is complexity
of rerunning. If rerunning is a cheap part of the debugging session,
then the undo facility is not going to seem that important.

Another use for reverse execution is a more general form of zeroing in
on a bug. Suppose you have a bogus pointer that was supposedly
returned by malloc() somewhere earlier in the program's
execution. That pointer may only sit in a named variable, and the rest
of time is wandering around in various other data structures. There's
no single thing to watch in this case; it's not the memory being
pointed to that's the problem, it's that the pointer itself goes "off
the reservation". So what you want to do is start from the point at
which you've discovered the pointer is bad (segfault perhaps), watch
the location holding the bad pointer, and run backwards until either
a) somebody modifies the pointer in place, or b) the bogus pointer is
copied from elsewhere, in which case you watch it and continue
backwards. In many cases you'll get to the bad code sooner than by
running forwards, where you have to anticipate which malloc will
produce the later-to-be-scrambled pointer, and generally trace along
trying to anticipate who's going to do what to it before the bad thing
happens. (The display of ignored breakpoint hits was introduced as a
way to speed this up some.) Again, as with undo, the efficiency of
this process vs re-running depends on whether the actual bug occurs
closer to the beginning of execution, or closer to the point of
failure. One could make an argument that most root-cause bugs tend to
occur closer to failure points than to the beginning of program
execution, but that's kind of a philosphical point about program
desing for which I have no concrete evidence.

Then there is stepping backwards by instructions to retrace what is
happening at the machine level. I'm less inclined to say this is
valuable; picking apart registers and raw memory is a rather
painstaking activity, so slow (at the human level), that the time to
re-run up to the line in question is usually negligible by
comparison. Even so, I can see it becoming very natural for a user to
do a step, see bogus data that simply can't be explained by the source
line on the screen, do a reverse-step and then multiple stepi's to
"slo-mo" the calculations of that line's compiled code.

I touched on hard-to-repeat cases briefly above - GDB mavens will
recognize this as one of the rationales for the tracepoint facility.
Reverse execution is similar in that once you've gotten the program
into a state where a problem manifests, you want to poke around in the
program's immediate past states. Tracepoints however are designed such
that the user needs to anticipate what data will be interesting;
sensible in a decoupled remote debugging context, but not so good for
the data-driven spur-of-the-moment experimentation that is part of a
productive debugging session. So a working reverse execution gives the
user freedom to look around a program's entire state while moving up
and down along the flow of execution. (Ironically, this capability
might work against good program design, in that it takes away some
incentive to design a program with repeatable behavior. For instance,
programs using random number generator often include machinery to
display and input RNG seeds, one of the uses being to guarantee
predictability while re-running under a debugger.)

But will users actually use any of this in real life? "Undo" is pretty
easy - everybody understands "undo", even nonprogrammers, with many
GUIs giving it a dedicated keystroke. Tracking data backwards through
a program is a powerful tool for a tough class of bugs, but as we know
from past experience, powerful features that are at all hard to use
are often ignored. Single-instruction reverse stepping is conceptually
simpler, but likely to see more interest from the low-level
developers, and may only be interesting if available for kernel
debugging and the like. Reproducibility problems crop up regularly, so
I can see people wanting to use reverse execution after a breakpoint
sets them down in rarely-executed code.

Once we have an idea of what we think users will want from the
feature, we'll have a better idea of what characteristics and
limitations might be acceptable in an implementation.

Stan