Optimizing watchpoints

public inbox for frysk@sourceware.org
 help / color / mirror / Atom feed

* Optimizing watchpoints
@ 2007-09-28 21:21 Phil Muldoon
  2007-09-30 19:10 ` Mark Wielaard
  2007-10-01  1:25 ` Roland McGrath
  0 siblings, 2 replies; 10+ messages in thread
From: Phil Muldoon @ 2007-09-28 21:21 UTC (permalink / raw)
  To: Frysk Hackers

I wasn't going to do this in the first pass at watchpoints, but I might 
as well as think on it now.

Given that debug registers are very scarce, and that typically there 
might be a scenario where we could better optimize watching similar 
addresses and ranges, is there an established protocol for this?

I thought about checking addresses and ranges as a simple "we are 
already watching this address in another register and scope". But as 
usual things get hairy in C when you add in pointers (that pointer to 
pointers in structures and so on).

I suspect the answer here is "No" but can't help to ask, and establish 
the conversation ;)

Regards

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-09-28 21:21 Optimizing watchpoints Phil Muldoon
@ 2007-09-30 19:10 ` Mark Wielaard
  2007-10-01  1:25 ` Roland McGrath
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Wielaard @ 2007-09-30 19:10 UTC (permalink / raw)
  To: Phil Muldoon; +Cc: Frysk Hackers

Hi Phil,

On Fri, 2007-09-28 at 22:20 +0100, Phil Muldoon wrote:
> Given that debug registers are very scarce, and that typically there 
> might be a scenario where we could better optimize watching similar 
> addresses and ranges, is there an established protocol for this?
> 
> I thought about checking addresses and ranges as a simple "we are 
> already watching this address in another register and scope". But as 
> usual things get hairy in C when you add in pointers (that pointer to 
> pointers in structures and so on).

We are already doing a simple version of this for low level breakpoints.
For each Code observer added at the proc level there is a simple map,
BreakpointAddresses, that keeps track of whether or not an actual
breakpoint instruction is already there. So whenever a Code observer is
added or removed at a particular address the BreakpointAddresses map is
consulted to see whether an existing one can be reused, or should be
kept, because other Code observers are monitoring the address.

For watchpoints you don't just have simple addresses, but also a range.
That makes your 'overlapping' detection a little less straightforward
than in the Code observer case. But BreakpointAddresses does already
give a simple version that can find breakpoint addressses given a range
of memory, public Iterator getBreakpoints(long from, long till).
Something like this, based on a TreeSet of addresses plus ranges can
probably be used to implement your watchpoint address range overlapping
detection.

I wouldn't worry too much about the particulars of hairy C structures.
That should be handled at a higher language level by installing multiple
proc watchpoint observers.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-09-28 21:21 Optimizing watchpoints Phil Muldoon
  2007-09-30 19:10 ` Mark Wielaard
@ 2007-10-01  1:25 ` Roland McGrath
  2007-10-01  8:41   ` Mark Wielaard
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Roland McGrath @ 2007-10-01  1:25 UTC (permalink / raw)
  To: Phil Muldoon; +Cc: Frysk Hackers

You've brought up two issues, which I think each deserve their own separate
thread of discussion.  The second thread is about indirection, or generally
speaking, dynamic specification of watchpoint addresses.  That is a worthy
and interesting subject, but I don't think you need to worry about it now.
For considering the watchpoint implementation per se, we can just talk
about "a watchpoint" as being a request to watch a given fixed address.

There are two aspects of specification that will poke through to the
lowest-level implementation layers.  Those are how you can specify the
address range and whose actions you want to watch.

For the latter, that means an individual thread or a group of threads that
share a set of watchpoints.  Right now, the implementation can only be done
by setting each watchpoint individually on each thread.  But it is likely
that future facilities will be able to share some low-level resources and
interface overhead by treating uniformly an arbitrary subset of threads in
the same address space.  It also likely to matter whether the chosen subset
is in fact the whole set of all threads in the same address space, and
whether a thread has only the breakpoints shared with its siblings in a
chosen subset, or has those plus additional private breakpoints of its own.
So it's worthwhile to think about how the structure of keeping track of
watchpoints (and other kinds of low-level breakpoints) can reflect those
groupings of threads from the high-level semantic control plane down to the
lowest-level implementation, where the most important sharing can occur.

For the address range specification, the low-level implementation pokes up
to constrain the granularity at which you can usefully specify what you
want to watch.  On the most common machines, it's only naturally-aligned
chunks whose size is some machine-dependent subset of 1, 2, 4, 8 bytes.
(i.e. 1,2,4 for 32-bit x86 kernels, 1,2,4,8 for x86_64 kernels, and 8 for
powerpc and ia64).  Probably future facilities will add page size to that
set of sizes (using a software VM facility rather than hardware features).
So the first step has to be turning the semantic request of a given address
range into one or more aligned-address, size pairs (or if you prefer,
address, mask pairs) of sizes supported at low level.  Then you maintain
the active set in that form, and identify the redundancies as they go in.
Only when the duplicate-free active set changes do you poke the low level.

There is one final aspect of organization to consider.  At the lowest
level, there is a fixed-size hardware resource of watchpoint slots.  When
you set them with ptrace, the operating system just context-switches them
for each thread in the most straightforward way.  So the hardware resource
is yours to decide how to allocate.  However, this is not what we expect to
see in future facilities.  The coming model is that hardware watchpoints
are a shared resource managed and virtualized to a certain degree by the
operating system.  The debugger may be one among several noncooperating
users of this resource, for both per-thread and system-wide uses.  Rather
than having the hardware slots to allocate as you choose, you will specify
what you want in a slot, and a priority, and can get dynamic feedback about
the availability of a slot for your priority.  (For compatibility, ptrace
itself will use that facility to virtualize the demands made by
PTRACE_SET_DEBUGREG and the like.  ptrace uses a known priority number that
is fairly high, so that some system-wide or other background tracing would
have to knowingly intend to interfere with traditional user application use
by choosing an even higher priority.)

In the long run, the way to look at this is not as a set of resources to be
allocated, but as a bag of tricks each with different tradeoffs of
constraints, costs, and dynamic resource contention.  

At one extreme you have single-step, i.e. software watchpoints by storing
the old value, stepping an instruction, and checking if the value in memory
changed.  This has few constraints on specification (only that you can't
distinguish stored-same from no-store, and it's not a mechanism for data
read breakpoints).  It has no resource contention issues at all.  It is
inordinately expensive in CPU time (though a straightforward in-kernel
implementation could easily be orders of magnitude faster than the
traditional debugger experience of implementing this).

Hardware watchpoints have some precise constraints and they compete for a
very limited dynamic resource, but they are extremely cheap in CPU time.

In the future we might have the option of VM tricks.  Those have their own
constraints (page granularity on addresses), they consume an important
dynamic resource that is finite but often not too constrained in practice
(page tables), and they have a somewhat complex balance of cost.

When I talk about cost, I mean roughly the setup overhead, plus the
exception overhead of a hit, plus the overhead of taking and sorting out
any false positive hits (due to the granularity of the facility you choose
being coarser than what you're actually aiming for).  For cases like VM
tricks, this takes in the complex set of factors affecting scaling and
secondary overhead (page table locking, TLB, cache, etc).

To satisfy a set of requests percolated down from the higher levels, you
take those requirements, your bag of tricks, and each trick's particular
tradeoffs for each case, and figure out dynamically what to do.  For tricks
that depend on resources with priority allocations, i.e. hardware
watchpoints, you have to rate the cost-effectiveness of the next-best trick
you could use instead, somehow scaled by how badly you want to get this
done properly, to come up with the right priority number to use so as best
to express decent global tradeoffs among the competing needs in the system.

In a simple example, you can do a one-byte data write watchpoint on powerpc
but only as a two-tier operation.  You set the hardware to catch stores to
the aligned 8-byte word containing the target byte address.  You get a
fault before the store happens, but you only know it will store to one or
more of those 8 bytes.  So, you can save the old 8 bytes, disable the
hardware breakpoint, single-step, and compare the new 8 bytes to the old to
decide which smaller-granularity data-change watchpoints to say have hit.
This is a pretty good tradeoff, since while the total overhead of a hit is
at least twice that of an optimal case (aligned 8-byte watchpoint), the
likely rate of false positives is quite low.  (Compared to "sorry, just
can't do it", that's really quite good.)

In an oft-imagined future example, VM tricks give an effectively unlimited
resource in how many watchpoints (of that kind) you can have installed
simultaneously.  Even if you can only have one word-granularity watchpoint,
or four, or none, you can set lots of page-granularity watchpoints.  You
can dynamically track the rate of false-positives engendered by those (page
faults you have to examine and let go), potentially feeding a complex
estimate of the cost of maintaining particular watchpoints in the page
tables.  You can install many page-level watchpoints to start, then respond
to their hits and to the dynamic feedback of hardware watchpoint slot
pressure to choose the hottest watchpoints and give them the hardware
slots.  In the absence of hardware slots, this might always lead to
single-step over stores to watched pages, or to declaring the situation too
costly and alerting the user we had to give up.

The trivial case is that you request at low level exactly one watchpoint
for what the user needs, with normal ptrace-equivalent priority.  If the
slot is (or becomes) unavailable, you tell the user "no can do", end of story.
(In today's implementation based on ptrace, this is what the low level
always has to boil down to.)

This view of the whole bag of tricks should be unified across all sorts of
low-level breakpoint resources, i.e. instruction breakpoints as well as
data read/write breakpoints (aka watchpoints).  Normal breakpoint insertion
and related techniques are more of the tricks in the bag; like some of the
hardware features mentioned earlier, they are very constrained in their
kind of specification (an exact instruction address).  

The most common case is one with no watchpoints at all, but just a few
instruction breakpoints.  On x86 and ia64, the same hardware features used
for data breakpoints work for these and are orders of magnitude less costly
(and far less complicated) than even the sexiest hypothetical breakpoint
assistance techniques.  When the pressure for those resources allows, it's
always optimal to use hardware breakpoints in preference to memory insertion.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-01  1:25 ` Roland McGrath
@ 2007-10-01  8:41   ` Mark Wielaard
  2007-10-01  9:11     ` Phil Muldoon
  2007-10-01 17:40     ` Roland McGrath
  2007-10-01 17:54   ` Phil Muldoon
  2007-10-10  7:11   ` Phil Muldoon
  2 siblings, 2 replies; 10+ messages in thread
From: Mark Wielaard @ 2007-10-01  8:41 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Phil Muldoon, Frysk Hackers

Hi Roland,

On Sun, 2007-09-30 at 18:25 -0700, Roland McGrath wrote:
> (For compatibility, ptrace
> itself will use that facility to virtualize the demands made by
> PTRACE_SET_DEBUGREG and the like.  ptrace uses a known priority number that
> is fairly high, so that some system-wide or other background tracing would
> have to knowingly intend to interfere with traditional user application use
> by choosing an even higher priority.)

Just a FYI. I see (through a quick kernel grep) PTRACE_SET_DEBUGREG is
only available on powerpc. For x86[_64] frysk pokes at the hardware
debug registers through the USR area and getting/setting them
"directly". This might have to become a special case of the above.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-01  8:41   ` Mark Wielaard
@ 2007-10-01  9:11     ` Phil Muldoon
  2007-10-01  9:20       ` Mark Wielaard
  2007-10-01 17:40     ` Roland McGrath
  1 sibling, 1 reply; 10+ messages in thread
From: Phil Muldoon @ 2007-10-01  9:11 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Roland McGrath, Frysk Hackers

Mark Wielaard wrote:
> Hi Roland,
>
> On Sun, 2007-09-30 at 18:25 -0700, Roland McGrath wrote:
>   
>> (For compatibility, ptrace
>> itself will use that facility to virtualize the demands made by
>> PTRACE_SET_DEBUGREG and the like.  ptrace uses a known priority number that
>> is fairly high, so that some system-wide or other background tracing would
>> have to knowingly intend to interfere with traditional user application use
>> by choosing an even higher priority.)
>>     
>
> Just a FYI. I see (through a quick kernel grep) PTRACE_SET_DEBUGREG is
> only available on powerpc. For x86[_64] frysk pokes at the hardware
> debug registers through the USR area and getting/setting them
> "directly". This might have to become a special case of the above.
>
>   
Is this a hardware/OS difference do you know, or just a different 
implementation in the Frysk PPC ISA?

Regards

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-01  9:11     ` Phil Muldoon
@ 2007-10-01  9:20       ` Mark Wielaard
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Wielaard @ 2007-10-01  9:20 UTC (permalink / raw)
  To: Phil Muldoon; +Cc: Roland McGrath, Frysk Hackers

Hi Phil,

On Mon, 2007-10-01 at 10:11 +0100, Phil Muldoon wrote:
> Mark Wielaard wrote:
> > Just a FYI. I see (through a quick kernel grep) PTRACE_SET_DEBUGREG is
> > only available on powerpc. For x86[_64] frysk pokes at the hardware
> > debug registers through the USR area and getting/setting them
> > "directly". This might have to become a special case of the above.
> >  
> Is this a hardware/OS difference do you know, or just a different 
> implementation in the Frysk PPC ISA?

A quick look at the kernel sources makes me believe that this is a
kernel/ptrace difference in how to access debug registers between
different architectures. The PPC Isa is Frysk doesn't yet support
hardware debug register access. But my guess is that it just means it
would use a different bank, while on x86 we just see the debug registers
as part of the USR bank.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-01  8:41   ` Mark Wielaard
  2007-10-01  9:11     ` Phil Muldoon
@ 2007-10-01 17:40     ` Roland McGrath
  1 sibling, 0 replies; 10+ messages in thread
From: Roland McGrath @ 2007-10-01 17:40 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Phil Muldoon, Frysk Hackers

> Just a FYI. I see (through a quick kernel grep) PTRACE_SET_DEBUGREG is
> only available on powerpc. For x86[_64] frysk pokes at the hardware
> debug registers through the USR area and getting/setting them
> "directly". This might have to become a special case of the above.

"and the like".  (I already give more precise details than anyone wants
anyway.)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-01  1:25 ` Roland McGrath
  2007-10-01  8:41   ` Mark Wielaard
@ 2007-10-01 17:54   ` Phil Muldoon
  2007-10-10  7:11   ` Phil Muldoon
  2 siblings, 0 replies; 10+ messages in thread
From: Phil Muldoon @ 2007-10-01 17:54 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frysk Hackers

Roland McGrath wrote:
> You've brought up two issues, which I think each deserve their own separate
> thread of discussion.  The second thread is about indirection, or generally
> speaking, dynamic specification of watchpoint addresses.  That is a worthy
> and interesting subject, but I don't think you need to worry about it now.
> For considering the watchpoint implementation per se, we can just talk
> about "a watchpoint" as being a request to watch a given fixed address.
>   

Thanks for the detailed email. I'm still absorbing the information 
parted here, and will in my usual fashion, come  back with questions 
later as things make better/more sense. I get the impressions that 
watchpoints themselves are straightforward enough, but managing them and 
dealing with the arch specific edge cases is tricky situation.

Regards

Phil
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-01  1:25 ` Roland McGrath
  2007-10-01  8:41   ` Mark Wielaard
  2007-10-01 17:54   ` Phil Muldoon
@ 2007-10-10  7:11   ` Phil Muldoon
  2007-10-10 10:28     ` Mark Wielaard
  2 siblings, 1 reply; 10+ messages in thread
From: Phil Muldoon @ 2007-10-10  7:11 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frysk Hackers

Roland McGrath wrote:
> For the latter, that means an individual thread or a group of threads that
> share a set of watchpoints.  Right now, the implementation can only be done
> by setting each watchpoint individually on each thread.  But it is likely
> that future facilities will be able to share some low-level resources and
> interface overhead by treating uniformly an arbitrary subset of threads in
> the same address space.  

Ideally from an api perspective, I'd like both. In the past, I always 
found it useful to watch every thread in a process to see which one was 
clobbering this memory address. However I would still like to preserve 
single thread watchpoints from a user (Frysk) api perspective.

> It also likely to matter whether the chosen subset
> is in fact the whole set of all threads in the same address space, and
> whether a thread has only the breakpoints shared with its siblings in a
> chosen subset, or has those plus additional private breakpoints of its own.
> So it's worthwhile to think about how the structure of keeping track of
> watchpoints (and other kinds of low-level breakpoints) can reflect those
> groupings of threads from the high-level semantic control plane down to the
> lowest-level implementation, where the most important sharing can occur.
>   

Right now (correct me if I wrong here Mark), we do "software" code 
breakpoints via single-stepping and none of the limited debug registers 
are used for hardware code breakpoints. I guess the question here is 
whether we ever will, and if any design should reflect and be 
accommodating to that, or whether we should just "rewrite as necessary". 
For now I am going to take the latter, and pretend the former will never 
exist, at least in Frysk.

> There is one final aspect of organization to consider.  At the lowest
> level, there is a fixed-size hardware resource of watchpoint slots.  When
> you set them with ptrace, the operating system just context-switches them
> for each thread in the most straightforward way.  So the hardware resource
> is yours to decide how to allocate.  However, this is not what we expect to
> see in future facilities.  The coming model is that hardware watchpoints
> are a shared resource managed and virtualized to a certain degree by the
> operating system.  The debugger may be one among several noncooperating
> users of this resource, for both per-thread and system-wide uses.  Rather
> than having the hardware slots to allocate as you choose, you will specify
> what you want in a slot, and a priority, and can get dynamic feedback about
> the availability of a slot for your priority.  (For compatibility, ptrace
> itself will use that facility to virtualize the demands made by
> PTRACE_SET_DEBUGREG and the like.  ptrace uses a known priority number that
> is fairly high, so that some system-wide or other background tracing would
> have to knowingly intend to interfere with traditional user application use
> by choosing an even higher priority.)
>   

This is where I see the largest change in Frysk's implementation now, 
and where it will change in the future with utrace; and it would do to 
make this setting and getting stuff in a fairly abstract class that can 
be reslotted depending on implementation. This is where I have been 
currently spending a lot of my thinking time. Right now, the debug 
registers will be populated via Frysk's register access routines which 
are themselves being refactored. The ptrace peek and poke is abstracted 
from the code, and just a simple set/get will be performed via the Frysk 
functions to populate and read the debug registers. But as you mention, 
it appears in the utrace world that this will be taken from the 
(abstracted) ptrace user and managed by the kernel. For the purposes of 
context on this list, is that hardware watchpoint design set in stone 
with utrace now, and would it be safe to lay plans based on that?

> At one extreme you have single-step, i.e. software watchpoints by storing
> the old value, stepping an instruction, and checking if the value in memory
> changed.  This has few constraints on specification (only that you can't
> distinguish stored-same from no-store, and it's not a mechanism for data
> read breakpoints).  It has no resource contention issues at all.  It is
> inordinately expensive in CPU time (though a straightforward in-kernel
> implementation could easily be orders of magnitude faster than the
> traditional debugger experience of implementing this).
>   

Conceptually (again correct me if I am wrong again, Mark/Tim) this is 
what we do with Code breakpoints, so adding a software watchpoint would 
be a modification of that code, and the hardware watchpoints - at least 
at the engine level - would be separate implementation. The user may or 
may not know the difference on whether they are assigning a hardware or 
software watchpoints depending on the tuneability that is given to them. 
However, I have no plans for software watchpoints at this moment.

> Hardware watchpoints have some precise constraints and they compete for a
> very limited dynamic resource, but they are extremely cheap in CPU time.
>   

Yes and they seem to change on inter-model processor revisions too. Fun! 
Anyway, I'm still working on the bag of tricks for optimizing 
watchpoints. But I just wanted to give an overview to the first part of 
the email just as a wider scope, and open it up for comments about my 
long term intentions. I'll comment on the second part of your email later.

Regards

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Optimizing watchpoints
  2007-10-10  7:11   ` Phil Muldoon
@ 2007-10-10 10:28     ` Mark Wielaard
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Wielaard @ 2007-10-10 10:28 UTC (permalink / raw)
  To: Phil Muldoon; +Cc: Roland McGrath, Frysk Hackers

On Wed, 2007-10-10 at 08:11 +0100, Phil Muldoon wrote:
> Roland McGrath wrote:
> > For the latter, that means an individual thread or a group of threads that
> > share a set of watchpoints.  Right now, the implementation can only be done
> > by setting each watchpoint individually on each thread.  But it is likely
> > that future facilities will be able to share some low-level resources and
> > interface overhead by treating uniformly an arbitrary subset of threads in
> > the same address space.  
> 
> Ideally from an api perspective, I'd like both. In the past, I always 
> found it useful to watch every thread in a process to see which one was 
> clobbering this memory address. However I would still like to preserve 
> single thread watchpoints from a user (Frysk) api perspective.

You can of course simulate one with the other on the frysk.proc Observer
level. It is a good idea to keep the performance in mind when offering
options to observe on a single Task or whole Proc level. But even if the
underlying kernel/hardware interface only offers one, you can/should
offer the other (either by setting a watchpoint for each Task in a set,
or by filtering out watchpoints events for Tasks set on a Proc level
that the user isn't interested in). In fact we made the mistake with
Code observers to let them always trigger on a Proc wide basis (since
the underlying mechanism works by setting software breakpoints which are
always triggered for all Tasks in the Proc) even if the were registered
for only one Tasks. http://sourceware.org/bugzilla/show_bug.cgi?id=4895
(See also the earlier Task vs Proc wide Observers discussions earlier on
the list.)

> Right now (correct me if I wrong here Mark), we do "software" code 
> breakpoints via single-stepping and none of the limited debug registers 
> are used for hardware code breakpoints.

Yes you are right, we only do "software" breakpoints, not "hardware"
breakpoints at the moment. We use breakpoint instruction insertion into
the code stream and when those are hit we continue past the breakpoint
by either simulating the instruction (not fully implemented), placing a
copy of the instruction "out-of-line", step that and fixup any registers
(only done for a few instructions we know about, needs a full
instruction parser to be completed) or by placing back ("resetting") the
original instruction, step the Task and place the breakpoint instruction
back (this is fully implemented, but risks missing the breakpoint in
other running Tasks - gdb works around that by temporarily stopping the
world and only then do the reset-step-one-task dance).

We really should also support hardware based breakpoints since they are
way more efficient. But as you say they are a limited resource, whether
or not (and how) to expose that to the user on the frysk.proc Observer
level or just fall back to a less efficient (software based) breakpoint
is an open question.

> > There is one final aspect of organization to consider.  At the lowest
> > level, there is a fixed-size hardware resource of watchpoint slots.  When
> > you set them with ptrace, the operating system just context-switches them
> > for each thread in the most straightforward way.  So the hardware resource
> > is yours to decide how to allocate.  However, this is not what we expect to
> > see in future facilities.  The coming model is that hardware watchpoints
> > are a shared resource managed and virtualized to a certain degree by the
> > operating system.  The debugger may be one among several noncooperating
> > users of this resource, for both per-thread and system-wide uses.  Rather
> > than having the hardware slots to allocate as you choose, you will specify
> > what you want in a slot, and a priority, and can get dynamic feedback about
> > the availability of a slot for your priority.

This is interesting. Do you also forsee that threads of a process that
share the same processor can more easily share their breakpoints? That
is, could the debugger indicate that it would like to change the task
cpu-affinity for that?

> This is where I see the largest change in Frysk's implementation now, 
> and where it will change in the future with utrace; and it would do to 
> make this setting and getting stuff in a fairly abstract class that can 
> be reslotted depending on implementation. This is where I have been 
> currently spending a lot of my thinking time. Right now, the debug 
> registers will be populated via Frysk's register access routines which 
> are themselves being refactored. The ptrace peek and poke is abstracted 
> from the code, and just a simple set/get will be performed via the Frysk 
> functions to populate and read the debug registers. But as you mention, 
> it appears in the utrace world that this will be taken from the 
> (abstracted) ptrace user and managed by the kernel. For the purposes of 
> context on this list, is that hardware watchpoint design set in stone 
> with utrace now, and would it be safe to lay plans based on that?

Are you and Chris working together on the utrace abstraction layer? Or
is the frysk-utrace completely separate from this effort?

> > At one extreme you have single-step, i.e. software watchpoints by storing
> > the old value, stepping an instruction, and checking if the value in memory
> > changed.  This has few constraints on specification (only that you can't
> > distinguish stored-same from no-store, and it's not a mechanism for data
> > read breakpoints).  It has no resource contention issues at all.

And it would seem to be the only option if you want to watch values
stored in registers it seems.

>   It is
> > inordinately expensive in CPU time (though a straightforward in-kernel
> > implementation could easily be orders of magnitude faster than the
> > traditional debugger experience of implementing this).

A shared in-kernel breakpoint/watchpoint framework with for example the
systemtap project would be ideal!

> Conceptually (again correct me if I am wrong again, Mark/Tim) this is 
> what we do with Code breakpoints, so adding a software watchpoint would 
> be a modification of that code, and the hardware watchpoints - at least 
> at the engine level - would be separate implementation.

Yes, although it is currently abstracted at the frysk.proc.Instruction
level. Each Instruction knows whether it can either be simulated,
stepped-out-of-line or needs to be reset-putback in the original
instruction stream to continue past a breakpoint. It shouldn't be too
hard to abstract it on the frysk.proc.Breakpoint level however (I did
that before, by there were too many unknowns to come up with a good
abstract design without knowing what actual implementations would look
like).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-10-10 10:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-28 21:21 Optimizing watchpoints Phil Muldoon
2007-09-30 19:10 ` Mark Wielaard
2007-10-01  1:25 ` Roland McGrath
2007-10-01  8:41   ` Mark Wielaard
2007-10-01  9:11     ` Phil Muldoon
2007-10-01  9:20       ` Mark Wielaard
2007-10-01 17:40     ` Roland McGrath
2007-10-01 17:54   ` Phil Muldoon
2007-10-10  7:11   ` Phil Muldoon
2007-10-10 10:28     ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).