From mboxrd@z Thu Jan  1 00:00:00 1970
From: jtc@redback.com (J.T. Conklin)
To: Stephane Carrez <ciceron@sunchorus.France.Sun.COM>
Cc: gdb@sourceware.cygnus.com
Subject: Re: additional vector function to improve  register fetch performance
Date: Mon, 11 Dec 2000 15:21:00 -0000
Message-id: <5mhf4apmry.fsf@jtc.redback.com>
References: <200012081053.LAA22829@sunchorus.France.Sun.COM>
X-SW-Source: 2000-12/msg00066.html

>>>>> "Stephane" == Stephane Carrez <ciceron@sunchorus.France.Sun.COM> writes:
>> While we're talking about register fetch and stores, I had an idea the
>> outher night about improving performance by adding some new functions
>> to the target vector.

Stephane> For gdb for ChorusOS (system debug), I'm doing some prefetch
Stephane> to improve performance. For each processor, I've defined a
Stephane> list of "gdb important registers". I retreive the complete
Stephane> list when gdb asks for one of them.
Stephane>
Stephane> For example for Sparc, I've defined the following list:
Stephane>
Stephane>  	sp, pc, rp, fp
Stephane>
Stephane> So, when gdb asks for either sp or pc, I retreive these 4
Stephane> registers.
Stephane>
Stephane> In general, I've observed this is enought. For most
Stephane> processors Gdb needs the sp, pc and fp (in general after
Stephane> each stop). I'm using this for sparc, x86 and ppc.

I know well that providing a handful of registers can dramatically
improve performance.  

The GDB remote protocol has a mechanism such that that the values of
key registers can be returned along with the exception/signal number
whenever execution stops.  Long ago I submitted a patch to add the pc,
sp, and fp to the sample i386 and m68k stubs.  I don't know what ever
happened to it, I should probably dust it off and submit it again.

I'm not sure that your suggestion to prefetch all "gdb important
registers" when any in that set are read is generally applicable.
While this kind of optimization can be done in the target layer
reasonably easily for those GDB targets that are only used by one
target system, it is much more difficult for GDB targets that are
"generic" like remote.c.

One problem is that this breaks the abstraction layer between target-
independent and target specific code.  I admit that this is a fuzzy
line, but having remote.c (for example) know about "important regs"
clearly breaks it (IMO).

Another problem is that the target layer doesn't know what registers
are already stored in the regcache.  This would require additions to
the regcache API so it could be queried so that only those registers
that needed to be fetched would be.

Finally, this would have to be added to more than one target.  IMO,
this makes it clear that a mechanism for handling "important regs" in
the target layer is flawed.  If the idea is pursued, it belongs above
the target layer.  This would make a fetch of any of the "important
regs" a prefetch for all of the others in the set.

This could be implemented reasonably cleanly, and does not suffer from
the problem you alluded elsewhere in your response that you don't know
when a register is going to be needed.  Adding a bunch of explicit pre-
fetch hints may not help if you miss one critical one.  

But I still think that there needs to be some way for GDB to tell the
target layer that more than one register is going to be fetched.  It
won't matter for those targets that fetch all registers at once or
those that can only fetch one at a time, but those that can fetch a
arbitrary set (or even those that can both fetch all and fetch one)
can't take advantage of that unless it knows what registers will be
fetched.  Otherwise the latency of a full register set fetch or
multiple single register fetches is still a problem.  

>> * target_prefetch_register()
>> 
>> With the register cache and targets that always fetch the entire
>> register set, fetch performance is as good as can be expected.  But
>> with a target that can fetch one register at a time, GDB will issue
>> multiple single register fetches.  Due to command/response latency,
>> this has a significant performance impact.
>> 
>> One way this could be addressed is to always fetch the entire
>> register set.  The remote protocol is like this, while it can set a
>> single register, there is no command to fetch one.  This approach
>> may lose when the register set is large and the number of registers
>> to be fetched is small; it may be possible to issue several single-
>> register fetches in the time for one for the entire register set.

Stephane> Fetching all registers is a killer for PPC... In general,
Stephane> only the pc, lr and sp are used by Gdb (ok, except for
Stephane> arguments/locals).

This would be a problem even if there the remote protocol supported a
read-single register command, since GDB would fetch for each register
sequentially.  Whether to fetch the entire set or a fetch individual
registers depends on many factors: the number of registers, the size
of the register set, the bandwidth and latency of the debug channel.

Fortunately, as I mentioned earlier, the remote protocol allows for
register values to be returned when execution stops.  Before I added
support to our powerpc stub, "step", "next", etc. were painful.  To
be perfect, GDB needs to be able to take advantage of a step-out-of-
range command; but that's a different project.

Stephane> I like the idea of pre-fetching but I wouldn't introduce a
Stephane> new target vector for that. The 'prefetch_reg' is somewhat
Stephane> generic. We just have to keep a list of registers that will,
Stephane> soon, be required.
Stephane>
Stephane> Then, when we really need a register, the
Stephane> target_fetch_register() can look at the prefetch list that
Stephane> was built. It can then retrieve all of them depending on the
Stephane> remote protocol.

Until this moment, I've assumed that your implementation was in the
target layer.  But I've just reread your message and noticed that you
didn't say that.  In fact, it appears we've come to many of the same
conclusions.

One difference is that in my scheme, the target layer is explicitly
given prefetch hints and in yours the target layer accesses the hints
that are maintained above.  I'm not especially fond of either.  In my
scheme, each target must maintain its own infrastructure for recording
the prefetch list.  In yours, the target reaches above to get the list.
Perhaps the interface to target_fetch_registers() should be changed to
include a vector containing all the registers to be fetched.

Stephane> The "gdb important registers" approach is interesting in
Stephane> that it gives good performance win and does not need to add
Stephane> the pre-fetch hints.

Many thanks for your comments.  I'm not ready to act on this yet, but
I'm better off now than when I was before.

        --jtc

-- 
J.T. Conklin
RedBack Networks