From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21768 invoked by alias); 13 Sep 2005 18:11:23 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 21511 invoked by uid 22791); 13 Sep 2005 18:11:10 -0000 Received: from everest.cs.wisc.edu (HELO everest.cs.wisc.edu) (128.105.166.17) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Tue, 13 Sep 2005 18:11:10 +0000 Received: from everest.cs.wisc.edu (localhost [127.0.0.1]) by everest.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id j8DIB5Hn013270; Tue, 13 Sep 2005 13:11:05 -0500 Received: (from xu@localhost) by everest.cs.wisc.edu (8.13.1/8.13.1/Submit) id j8DIB5hf013269; Tue, 13 Sep 2005 13:11:05 -0500 Date: Tue, 13 Sep 2005 18:11:00 -0000 From: "Min Xu (Hsu)" To: Stan Shebs Cc: gdb@sources.redhat.com Subject: Re: Using reverse execution Message-ID: <20050913181057.GH5161@cs.wisc.edu> References: <432628AA.2040808@apple.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <432628AA.2040808@apple.com> User-Agent: Mutt/1.5.5.1i X-SW-Source: 2005-09/txt/msg00089.txt.bz2 I found myself often "simulating" reverse execution when using gdb by recording the sequence of actions (breakpoints) I issued in order to get to specific points in a program. I would second the "bookmarks" idea, which has been discussed back in May. To me it is the first step toward reverse execution without needing any special OS supports. The assumptions there were: 1. The program is deterministic. For single threaded program, this largely means the inputs are repeatable. GDB may have to record the command line parameters, but no more requirement should the program needs from GDB to repeat its inputs. 2. Re-running the program from the beginning is fast. Therefore, no checkpoint mechanism is required. I imagine the bookmarks are in a linear time history of the program execution. User should be able to name them and inspect variable values previously displayed. Internally, gdb can locate a bookmark by counting backward branches and setting appropriate breakpoint after enough number of backward branches are passed. On Mon, 12 Sep 2005 Stan Shebs wrote : > Hi all, I've been spending some time recently on the reverse execution > idea that's been getting some airplay since May or so > (http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and > subsequent), and would like to bring people up to date on some of > my thinking, plus solicit ideas. > > The context is Darwin aka Mac OS X native debugging (surprise), and > the idea is to make it something that any application developer using > the Xcode environment could use. There are of course a hundred > practical difficulties (how do you un-execute an IP packet send? how > do you reverse a billion instructions executed on multi-GHz > hardware?), so this is more of a research project at this point; a > real-life usable implementation probably entails extensive kernel > hacking, but right now we don't know enough even to tell kernel people > what facilities we want them to add. In this message, I'm going to > focus on the user model, without trying to tie things down to a > specific implementation. > > So my big question is: what is reverse execution good for? Thinking > about some of the difficulties I allude to above, it's easy to > conclude that maybe reverse execution is just "party tricks" - an > impressive demo perhaps, but not a feature that real-life users would > ever adopt. Since the May discussion, I've been watching myself while > debugging other things (like a GDB merge :-) ), and asking "if I had > reverse execution, when would I use it?". > > The thing that jumped out at me most strongly was reverse execution as > "undo" facility. > > For instance, when stepping through unfamiliar and complicated code, > it's very common to "next" over an apparently innocuous function, then > say "oh sh*t" - your data has magically changed for the worse and so > the function you nexted over must be the guilty party. But it's often > too late - you've passed by the key bit of wrong code, and need to > re-run. Much of the time this is OK, and only takes a moment; but if > your application is complicated (like an iTunes), or if you have a > complicated sequence of breaks and continues and user input to get to > the point of interest, re-running starts to get slow and > errorprone. You may also have a situation where the bug is > intermittent and not guaranteed to appear on rerunning (if that sounds > familiar, hold the thought for a moment). So in these kinds of cases, > what you really want to undo that last "next" so you can do a "step" > and descend into the function of unexpected interest. > > A similar case might occur with single-stepping through a series of > calculations. I suspect everybody has at one time or another stepped > over something like "a *= b;", printed the value of a only to get a > bogus value, then either mentally calculated a / b or typed in "p a / > b", as a quick way to recover the last value of a. It would have been > easier and faster just to back up by one line and print (or watch) > a. If the calculation is complicated, the manual un-calculate exposes > the user to blind alleys if the calcution was mistaken. For instance, > if you try to manually undo some pointer arithmetic, you might > mentally adjust by chars when you should be adjusting by ints, and > then be misled because you think that the bug is that the program is > writing bad stuff into memory, when it's the pointer value that's > mistaken. > > The key tradeoff for reverse execution as undo facility is complexity > of rerunning. If rerunning is a cheap part of the debugging session, > then the undo facility is not going to seem that important. > > Another use for reverse execution is a more general form of zeroing in > on a bug. Suppose you have a bogus pointer that was supposedly > returned by malloc() somewhere earlier in the program's > execution. That pointer may only sit in a named variable, and the rest > of time is wandering around in various other data structures. There's > no single thing to watch in this case; it's not the memory being > pointed to that's the problem, it's that the pointer itself goes "off > the reservation". So what you want to do is start from the point at > which you've discovered the pointer is bad (segfault perhaps), watch > the location holding the bad pointer, and run backwards until either > a) somebody modifies the pointer in place, or b) the bogus pointer is > copied from elsewhere, in which case you watch it and continue > backwards. In many cases you'll get to the bad code sooner than by > running forwards, where you have to anticipate which malloc will > produce the later-to-be-scrambled pointer, and generally trace along > trying to anticipate who's going to do what to it before the bad thing > happens. (The display of ignored breakpoint hits was introduced as a > way to speed this up some.) Again, as with undo, the efficiency of > this process vs re-running depends on whether the actual bug occurs > closer to the beginning of execution, or closer to the point of > failure. One could make an argument that most root-cause bugs tend to > occur closer to failure points than to the beginning of program > execution, but that's kind of a philosphical point about program > desing for which I have no concrete evidence. > > Then there is stepping backwards by instructions to retrace what is > happening at the machine level. I'm less inclined to say this is > valuable; picking apart registers and raw memory is a rather > painstaking activity, so slow (at the human level), that the time to > re-run up to the line in question is usually negligible by > comparison. Even so, I can see it becoming very natural for a user to > do a step, see bogus data that simply can't be explained by the source > line on the screen, do a reverse-step and then multiple stepi's to > "slo-mo" the calculations of that line's compiled code. > > I touched on hard-to-repeat cases briefly above - GDB mavens will > recognize this as one of the rationales for the tracepoint facility. > Reverse execution is similar in that once you've gotten the program > into a state where a problem manifests, you want to poke around in the > program's immediate past states. Tracepoints however are designed such > that the user needs to anticipate what data will be interesting; > sensible in a decoupled remote debugging context, but not so good for > the data-driven spur-of-the-moment experimentation that is part of a > productive debugging session. So a working reverse execution gives the > user freedom to look around a program's entire state while moving up > and down along the flow of execution. (Ironically, this capability > might work against good program design, in that it takes away some > incentive to design a program with repeatable behavior. For instance, > programs using random number generator often include machinery to > display and input RNG seeds, one of the uses being to guarantee > predictability while re-running under a debugger.) > > But will users actually use any of this in real life? "Undo" is pretty > easy - everybody understands "undo", even nonprogrammers, with many > GUIs giving it a dedicated keystroke. Tracking data backwards through > a program is a powerful tool for a tough class of bugs, but as we know > from past experience, powerful features that are at all hard to use > are often ignored. Single-instruction reverse stepping is conceptually > simpler, but likely to see more interest from the low-level > developers, and may only be interesting if available for kernel > debugging and the like. Reproducibility problems crop up regularly, so > I can see people wanting to use reverse execution after a breakpoint > sets them down in rarely-executed code. > > Once we have an idea of what we think users will want from the > feature, we'll have a better idea of what characteristics and > limitations might be acceptable in an implementation. > > Stan > > >