As you might have noticed over the last week I have dropped in the new
breakpoint stepping framework (or hopefully you didn't notice since I
did try to do it in chunks that all had no detectable functional
behavior changes). This is an update to my previous general stepping
overview http://sourceware.org/ml/frysk/2007-q2/msg00283.html
and a request for review of the next steps with regard to the
instruction parsers, testframework and what to use as ssol area.

The main difference with the previous stepping overview regarding the
ptrace state machine is that there is now an explicit Stepping state,
which is a sub-state of Running and that Running.sendContinue() now not
only determines how to instruct ptrace to continue the task, but also
returns a different Running state based on whether or not we asked for a
continue or step-continue. This makes interpreting the cause of the trap
event we subsequently expect to get easier. This was previously recorded
as a field in the Task itself.

With respect to breakpoints the ptrace task states have been adapted to
all call setupSteppingBreakpoint() after a breakpoint hit has been
detected, which makes sure the PC is setup correctly (which previously
could be off by one on some architectures) and to mark the Task as being
at that particular breakpoint (this is still kept as Task field for now)
and to not actually call Breakpoint.prepareStep(), which is now handled
in Running.sendContinue(). In the previous version when reset-breakpoint
stepping was used another task could miss the breakpoint if the Task
moved into a Blocked state before continuing.

All the logic of how to breakpoint step has been moved into the
Breakpoint class that checks the properties of a new Instruction class
object which is created by the Isa through an instruction parser before
the breakpoint is inserted at a given location (previously we
represented instructions just as a byte[]). An Instruction knows how
long the instruction is, which bytes it represents, whether it can be
single stepped out of line and how to set that up given the original pc
location and an alternative address, plus any fixups that are needed to
the Task afterwards (and it has a notion of whether or not the
Instruction can be simulated, but that isn't currently used, see below).
A Breakpoint ties an Instruction to a particular address and Proc (and
Tasks can have zero or more Breakpoints, they share the same Breakpoint
on the same address with other Tasks of a Proc and when no Tasks of a
Proc has an Breakpoint at a particular address anymore the Breakpoint is
removed).

For stepping the Breakpoint the Running.sendContinue() method first
calls Breakpoint.prepareStep(), then signals ptrace to do a single step
of the Task, putting the Task in Stepping state and then in
handleTrapEvent() calls Breakpoint.stepDone(). prepareStep() queries the
capabilities of the Instruction at the pc address and depending on that
either sets things up for doing a step out of line, simulate the
instruction (but none of the current Instructions have been setup to do
simulation yes, and look at the comment in prepareStep() to see what is
needed to fully enable this option) or reset the current Instruction
(removing the breakpoint instruction temporarily). Accordingly a
Breakpoint can be in the state NOT_STEPPING, OUT_OF_LINE_STEPPING,
SIMULATE_STEPPING or RESET_STEPPING.

In the case of RESET_STEPPING (which was the only option we supported
before) other Tasks might miss and just past the Breakpoint during the
brief period between the reset, step and reinstall. Breakpoint
prepareStep() just takes the Instruction bytes and puts them at the
current pc address, and doneStep() reinstates the breakpoint
instruction.

When the Instruction supports single step out of line then the
Breakpoint requests an address in the single step out of line area of
the Proc, instructs the Instruction to install itself there for the
current Task calling Instruction.setupExecuteOutOfLine(). The default
action of setupExecuteOutOfLine() is to set the pc to the given address
and place a copy the instruction bytes there (although this can be
overridden if an Instruction wants to do something more fancy). When the
task signals the Breakpoint that a step was taken by calling stepDone(),
the Breakpoint calls Instruction fixupExecuteOutOfLine() with the
original pc and replacement address so any adjustments can be done to
the Task registers. The default action is to just set the pc to the
original pc plus the length of the Instruction just stepped. But
Instructions can override that if more is needed. As an example the RET
instruction doesn't do any fixup (since the only action is setting the
pc to the right location in the first place) and the JMP instruction
sets the pc to original pc plus/minus the difference of the alternate
address and the pc after the single step. Afterward the Breakpoint
returns the used address to the Proc so it can be used by other Tasks
needing to do a single step out of line.

The Proc maintains a single step out of line area pool of addresses that
point to locations that are at least as big is the largest instruction
of the instruction set. The Proc gets this list from the Isa the first
time an address is requested through getOutOfLineAddress(). Currently
this is (for x86 and x86_64) just the address of the main function entry
point(see below). The address is taken out of the pool and the
Breakpoint is responsible for putting it back through doneOutOfLine (see
above). If no address is currently available the call blocks till one is
available (this was way easier than inventing yet another TaskState and
getting the communication between Proc and Task about this right, and
contention is very low and at the longest it takes for an address to
become available is one instruction single step).

All the above seems to work nicely for x86 and x86_64 (powerpc hasn't
been updated, but as long as an Isa doesn't return Instructions through
an instruction parser that returns single stepping out of line capable
Instructions the old reset-breakpoint stepping is used) with all the old
testcases and a couple of new ones but there are 3 areas of improvement
needed:

- Instruction Parser.  The framework is in place and works for the few
Instructions that are known to the instruction parse, but there are all
hand coded (see IA32InstructionParser which just handles NOP, INT3, RETQ
and one JMP variant, the X8664Instruction just delegates to the IA32 for
now). There don't seem to be libraries available to easily plugin that
would give us the fixup instructions needed. The best available is the
kprobes examples from the linux kernel which have as drawback that they
are coded to be intimately tied to the kernel/C way of doing things and
only handles instructions found in kernel space. For uprobes this should
have been extended to handle every instruction that can occur in user
space, but I haven't seen that work yet (and apparently is only
available for x86 and no other architecture at this time). Any
alternatives to look at would be appreciated. Otherwise I need to sit
down with the various instruction manuals and just code it up by hand.
(Bonus points for finding something that would not just give us ssol
fixups but also simulation of instructions when hooked to the registers
and memory of a Task).

- Testsuite.  The best I could come up with for now is TestInstructions
and funit-instructions.c which is a nice little framework for having a
simple assembly instructions labeled in such a way that a 'instruction
address path' can be communicated to the java side so it can do either a
full step through the program or install breakpoints on every
instruction and make sure that every step and/or breakpoint hit takes it
to the right next address. But the assembly part has to be written
completely by hand (and unsurprisingly the simple assembly program
currently used works identically on x86 and x86_64). Trying to reuse the
funit-asm.h generic assembly instructions to make it platform neutral
was really hard especially since the labels used on the C and inline
assembly side have to kept in sync and mixing C defines and asm
statements don't seem to mix very well. Also at the moment it is just
using nops and jmps to keep things simple. I guess that in the end, if
we have real instruction parsers we could try to generate them using
that. Ideas or experiences with setting something up to track individual
instructions and getting the right addresses listed appreciated.

- Single Step Out Of Line Address Area.  Currently the Isa (for x86 and
x86_64 at least) just provide one address. The address of the main()
function entry point taken by just doing:

        Elf elf = new Elf(proc.getExe(), ElfCommand.ELF_C_READ);
        Dwarf dwarf = new Dwarf(elf, DwarfCommand.READ, null);
        DwarfDie die = DwarfDie.getDecl(dwarf, "main");
        return die.getEntryBreakpoints();

This works surprisingly well for a simple first approach, and programs
generally don't reenter their own main() function. But it would be nice
to either find an area that is guaranteed to never be used (again) by
the process, or to map in an executable area in the inferior that is
just used by us (maybe just making the inferior load a dummy shared
library). Again any suggestions welcome.