From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23900 invoked by alias); 13 Sep 2005 01:17:39 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 23879 invoked by uid 22791); 13 Sep 2005 01:17:32 -0000 Received: from mail-out4.apple.com (HELO mail-out4.apple.com) (17.254.13.23) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Tue, 13 Sep 2005 01:17:32 +0000 Received: from relay5.apple.com (a17-128-113-35.apple.com [17.128.113.35]) by mail-out4.apple.com (8.12.11/8.12.11) with ESMTP id j8D1HU11023332 for ; Mon, 12 Sep 2005 18:17:30 -0700 (PDT) Received: from [17.219.207.43] (unknown [17.219.207.43]) by relay5.apple.com (Apple SCV relay) with ESMTP id E0C8A324014 for ; Mon, 12 Sep 2005 18:17:29 -0700 (PDT) Message-ID: <432628AA.2040808@apple.com> Date: Tue, 13 Sep 2005 01:17:00 -0000 From: Stan Shebs User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.3) Gecko/20040910 MIME-Version: 1.0 To: gdb@sources.redhat.com Subject: Using reverse execution Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2005-09/txt/msg00080.txt.bz2 Hi all, I've been spending some time recently on the reverse execution idea that's been getting some airplay since May or so (http://sources.redhat.com/ml/gdb/2005-05/msg00145.html and subsequent), and would like to bring people up to date on some of my thinking, plus solicit ideas. The context is Darwin aka Mac OS X native debugging (surprise), and the idea is to make it something that any application developer using the Xcode environment could use. There are of course a hundred practical difficulties (how do you un-execute an IP packet send? how do you reverse a billion instructions executed on multi-GHz hardware?), so this is more of a research project at this point; a real-life usable implementation probably entails extensive kernel hacking, but right now we don't know enough even to tell kernel people what facilities we want them to add. In this message, I'm going to focus on the user model, without trying to tie things down to a specific implementation. So my big question is: what is reverse execution good for? Thinking about some of the difficulties I allude to above, it's easy to conclude that maybe reverse execution is just "party tricks" - an impressive demo perhaps, but not a feature that real-life users would ever adopt. Since the May discussion, I've been watching myself while debugging other things (like a GDB merge :-) ), and asking "if I had reverse execution, when would I use it?". The thing that jumped out at me most strongly was reverse execution as "undo" facility. For instance, when stepping through unfamiliar and complicated code, it's very common to "next" over an apparently innocuous function, then say "oh sh*t" - your data has magically changed for the worse and so the function you nexted over must be the guilty party. But it's often too late - you've passed by the key bit of wrong code, and need to re-run. Much of the time this is OK, and only takes a moment; but if your application is complicated (like an iTunes), or if you have a complicated sequence of breaks and continues and user input to get to the point of interest, re-running starts to get slow and errorprone. You may also have a situation where the bug is intermittent and not guaranteed to appear on rerunning (if that sounds familiar, hold the thought for a moment). So in these kinds of cases, what you really want to undo that last "next" so you can do a "step" and descend into the function of unexpected interest. A similar case might occur with single-stepping through a series of calculations. I suspect everybody has at one time or another stepped over something like "a *= b;", printed the value of a only to get a bogus value, then either mentally calculated a / b or typed in "p a / b", as a quick way to recover the last value of a. It would have been easier and faster just to back up by one line and print (or watch) a. If the calculation is complicated, the manual un-calculate exposes the user to blind alleys if the calcution was mistaken. For instance, if you try to manually undo some pointer arithmetic, you might mentally adjust by chars when you should be adjusting by ints, and then be misled because you think that the bug is that the program is writing bad stuff into memory, when it's the pointer value that's mistaken. The key tradeoff for reverse execution as undo facility is complexity of rerunning. If rerunning is a cheap part of the debugging session, then the undo facility is not going to seem that important. Another use for reverse execution is a more general form of zeroing in on a bug. Suppose you have a bogus pointer that was supposedly returned by malloc() somewhere earlier in the program's execution. That pointer may only sit in a named variable, and the rest of time is wandering around in various other data structures. There's no single thing to watch in this case; it's not the memory being pointed to that's the problem, it's that the pointer itself goes "off the reservation". So what you want to do is start from the point at which you've discovered the pointer is bad (segfault perhaps), watch the location holding the bad pointer, and run backwards until either a) somebody modifies the pointer in place, or b) the bogus pointer is copied from elsewhere, in which case you watch it and continue backwards. In many cases you'll get to the bad code sooner than by running forwards, where you have to anticipate which malloc will produce the later-to-be-scrambled pointer, and generally trace along trying to anticipate who's going to do what to it before the bad thing happens. (The display of ignored breakpoint hits was introduced as a way to speed this up some.) Again, as with undo, the efficiency of this process vs re-running depends on whether the actual bug occurs closer to the beginning of execution, or closer to the point of failure. One could make an argument that most root-cause bugs tend to occur closer to failure points than to the beginning of program execution, but that's kind of a philosphical point about program desing for which I have no concrete evidence. Then there is stepping backwards by instructions to retrace what is happening at the machine level. I'm less inclined to say this is valuable; picking apart registers and raw memory is a rather painstaking activity, so slow (at the human level), that the time to re-run up to the line in question is usually negligible by comparison. Even so, I can see it becoming very natural for a user to do a step, see bogus data that simply can't be explained by the source line on the screen, do a reverse-step and then multiple stepi's to "slo-mo" the calculations of that line's compiled code. I touched on hard-to-repeat cases briefly above - GDB mavens will recognize this as one of the rationales for the tracepoint facility. Reverse execution is similar in that once you've gotten the program into a state where a problem manifests, you want to poke around in the program's immediate past states. Tracepoints however are designed such that the user needs to anticipate what data will be interesting; sensible in a decoupled remote debugging context, but not so good for the data-driven spur-of-the-moment experimentation that is part of a productive debugging session. So a working reverse execution gives the user freedom to look around a program's entire state while moving up and down along the flow of execution. (Ironically, this capability might work against good program design, in that it takes away some incentive to design a program with repeatable behavior. For instance, programs using random number generator often include machinery to display and input RNG seeds, one of the uses being to guarantee predictability while re-running under a debugger.) But will users actually use any of this in real life? "Undo" is pretty easy - everybody understands "undo", even nonprogrammers, with many GUIs giving it a dedicated keystroke. Tracking data backwards through a program is a powerful tool for a tough class of bugs, but as we know from past experience, powerful features that are at all hard to use are often ignored. Single-instruction reverse stepping is conceptually simpler, but likely to see more interest from the low-level developers, and may only be interesting if available for kernel debugging and the like. Reproducibility problems crop up regularly, so I can see people wanting to use reverse execution after a breakpoint sets them down in rarely-executed code. Once we have an idea of what we think users will want from the feature, we'll have a better idea of what characteristics and limitations might be acceptable in an implementation. Stan