Re: additional vector function to improve register fetch performance

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* Re: additional vector function to improve  register fetch performance
@ 2000-12-08  2:53 Stephane Carrez
  2000-12-11 15:21 ` J.T. Conklin
  0 siblings, 1 reply; 5+ messages in thread
From: Stephane Carrez @ 2000-12-08  2:53 UTC (permalink / raw)
  To: gdb, jtc

Hi!

> While we're talking about register fetch and stores, I had an idea the
> outher night about improving performance by adding some new functions
> to the target vector.
> 

For gdb for ChorusOS (system debug), I'm doing some prefetch to
improve performance. For each processor, I've defined a list of
"gdb important registers". I retreive the complete list when gdb asks
for one of them.

For example for Sparc, I've defined the following list:

 	sp, pc, rp, fp

So, when gdb asks for either sp or pc, I retreive these 4 registers.

In general, I've observed this is enought. For most processors
Gdb needs the sp, pc and fp (in general after each stop). I'm using this
for sparc, x86 and ppc.

> * target_prefetch_register()
> 
>   With the register cache and targets that always fetch the entire
>   register set, fetch performance is as good as can be expected.  But
>   with a target that can fetch one register at a time, GDB will issue
>   multiple single register fetches.  Due to command/response latency,
>   this has a significant performance impact.
> 
>   One way this could be addressed is to always fetch the entire
>   register set.  The remote protocol is like this, while it can set a
>   single register, there is no command to fetch one.  This approach
>   may lose when the register set is large and the number of registers
>   to be fetched is small; it may be possible to issue several single-
>   register fetches in the time for one for the entire register set.

Fetching all registers is a killer for PPC... In general, only
the pc, lr and sp are used by Gdb (ok, except for arguments/locals).

> 
>   Another is the proposed target_prefetch_register() vector function.
>   All this does is do a hint that we'll need the value of a register
>   sometime "soon".  A sequence like:
> 
>              sp = read_sp ();
>              fp = read_fp ();
>              pc = read_pc ();
>              r0 = read_register (R0_REGNUM);
> 
>   Might be changed to:
> 
>              prefetch_sp ();
>              prefetch_fp ();
>              prefetch_pc ();
>              prefetch_register (R0_REGNUM);
>              sp = read_sp ();
>              fp = read_fp ();
>              pc = read_pc ();
>              r0 = read_register (R0_REGNUM);
> 
>   (I'm assuming the prefetch* functions are added to regcache.c to do
>   whatever housekeeping is required and call the target vector
>   function).
> 
>   In a trival target, prefetch would do nothing.  In one that has a
>   async protocol, it might start fetching those registers (a callback
>   would install the value in the cache when the values were received).
>   In one that could do single, or full register set fetches, it might
>   defer fetching anything until the first "real" read was received.
>   At that time it would decide whether what type of fetch is the most
>   optimum to perform.
> 
>   The disadvantage of this is that there is no benefit if the prefetch
>   hints aren't added.  The good thing is that it keeps the interface
>   between the target independent and target specific parts of GDB
>   reasonably clean.  For contrast, imagine of a target vector function
>   that took a list of registers to read.  This (IMO) would be much
>   more difficult to use effectively.
> 
> Thoughts?  I have some partially thought ideas on how to do the same
> for register stores, but I'm going to wait until they've firmed up a
> bit before sharing.
> 
>         --jtc
>   
> -- 
> J.T. Conklin
> RedBack Networks

I like the idea of pre-fetching but I wouldn't introduce a new target
vector for that. The 'prefetch_reg' is somewhat generic. We just have
to keep a list of registers that will, soon, be required.

Then, when we really need a register, the target_fetch_register() can
look at the prefetch list that was built. It can then retrieve all of
them depending on the remote protocol.

Adding prefetch hints might be difficult and sometimes you will win nothing.
This is because when you are in the Gdb-generic code, you don't know
in advance which registers you will need. For example, when the frame is
computed, the processor specific code is called. Since this is processor
specific, you don't know which register to prefetch. Adding the pre-fetch 
in the *-tdep.c files will not help you because in general you need the 
register rigth now (ex: sparc_saved_pc_after_call, rs6000_saved_pc_after_call).

The "gdb important registers" approach is interesting in that it gives
good performance win and does not need to add the pre-fetch hints.

	Stephane

-	-	-	-	-	-	-	-	-	-
Stephane |Sun Microsystems			|
 Carrez	 |Network Service Provider Division	| http://www.sun.com
	 |6 avenue Gustave Eiffel		|
	 |F-78182, St-Quentin-en-Yvelines-Cedex |

email: Stephane.Carrez@France.Sun.COM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: additional vector function to improve  register fetch performance
  2000-12-08  2:53 additional vector function to improve register fetch performance Stephane Carrez
@ 2000-12-11 15:21 ` J.T. Conklin
  0 siblings, 0 replies; 5+ messages in thread
From: J.T. Conklin @ 2000-12-11 15:21 UTC (permalink / raw)
  To: Stephane Carrez; +Cc: gdb

>>>>> "Stephane" == Stephane Carrez <ciceron@sunchorus.France.Sun.COM> writes:
>> While we're talking about register fetch and stores, I had an idea the
>> outher night about improving performance by adding some new functions
>> to the target vector.

Stephane> For gdb for ChorusOS (system debug), I'm doing some prefetch
Stephane> to improve performance. For each processor, I've defined a
Stephane> list of "gdb important registers". I retreive the complete
Stephane> list when gdb asks for one of them.
Stephane>
Stephane> For example for Sparc, I've defined the following list:
Stephane>
Stephane>  	sp, pc, rp, fp
Stephane>
Stephane> So, when gdb asks for either sp or pc, I retreive these 4
Stephane> registers.
Stephane>
Stephane> In general, I've observed this is enought. For most
Stephane> processors Gdb needs the sp, pc and fp (in general after
Stephane> each stop). I'm using this for sparc, x86 and ppc.

I know well that providing a handful of registers can dramatically
improve performance.  

The GDB remote protocol has a mechanism such that that the values of
key registers can be returned along with the exception/signal number
whenever execution stops.  Long ago I submitted a patch to add the pc,
sp, and fp to the sample i386 and m68k stubs.  I don't know what ever
happened to it, I should probably dust it off and submit it again.

I'm not sure that your suggestion to prefetch all "gdb important
registers" when any in that set are read is generally applicable.
While this kind of optimization can be done in the target layer
reasonably easily for those GDB targets that are only used by one
target system, it is much more difficult for GDB targets that are
"generic" like remote.c.

One problem is that this breaks the abstraction layer between target-
independent and target specific code.  I admit that this is a fuzzy
line, but having remote.c (for example) know about "important regs"
clearly breaks it (IMO).

Another problem is that the target layer doesn't know what registers
are already stored in the regcache.  This would require additions to
the regcache API so it could be queried so that only those registers
that needed to be fetched would be.

Finally, this would have to be added to more than one target.  IMO,
this makes it clear that a mechanism for handling "important regs" in
the target layer is flawed.  If the idea is pursued, it belongs above
the target layer.  This would make a fetch of any of the "important
regs" a prefetch for all of the others in the set.

This could be implemented reasonably cleanly, and does not suffer from
the problem you alluded elsewhere in your response that you don't know
when a register is going to be needed.  Adding a bunch of explicit pre-
fetch hints may not help if you miss one critical one.  

But I still think that there needs to be some way for GDB to tell the
target layer that more than one register is going to be fetched.  It
won't matter for those targets that fetch all registers at once or
those that can only fetch one at a time, but those that can fetch a
arbitrary set (or even those that can both fetch all and fetch one)
can't take advantage of that unless it knows what registers will be
fetched.  Otherwise the latency of a full register set fetch or
multiple single register fetches is still a problem.  

>> * target_prefetch_register()
>> 
>> With the register cache and targets that always fetch the entire
>> register set, fetch performance is as good as can be expected.  But
>> with a target that can fetch one register at a time, GDB will issue
>> multiple single register fetches.  Due to command/response latency,
>> this has a significant performance impact.
>> 
>> One way this could be addressed is to always fetch the entire
>> register set.  The remote protocol is like this, while it can set a
>> single register, there is no command to fetch one.  This approach
>> may lose when the register set is large and the number of registers
>> to be fetched is small; it may be possible to issue several single-
>> register fetches in the time for one for the entire register set.

Stephane> Fetching all registers is a killer for PPC... In general,
Stephane> only the pc, lr and sp are used by Gdb (ok, except for
Stephane> arguments/locals).

This would be a problem even if there the remote protocol supported a
read-single register command, since GDB would fetch for each register
sequentially.  Whether to fetch the entire set or a fetch individual
registers depends on many factors: the number of registers, the size
of the register set, the bandwidth and latency of the debug channel.

Fortunately, as I mentioned earlier, the remote protocol allows for
register values to be returned when execution stops.  Before I added
support to our powerpc stub, "step", "next", etc. were painful.  To
be perfect, GDB needs to be able to take advantage of a step-out-of-
range command; but that's a different project.

Stephane> I like the idea of pre-fetching but I wouldn't introduce a
Stephane> new target vector for that. The 'prefetch_reg' is somewhat
Stephane> generic. We just have to keep a list of registers that will,
Stephane> soon, be required.
Stephane>
Stephane> Then, when we really need a register, the
Stephane> target_fetch_register() can look at the prefetch list that
Stephane> was built. It can then retrieve all of them depending on the
Stephane> remote protocol.

Until this moment, I've assumed that your implementation was in the
target layer.  But I've just reread your message and noticed that you
didn't say that.  In fact, it appears we've come to many of the same
conclusions.

One difference is that in my scheme, the target layer is explicitly
given prefetch hints and in yours the target layer accesses the hints
that are maintained above.  I'm not especially fond of either.  In my
scheme, each target must maintain its own infrastructure for recording
the prefetch list.  In yours, the target reaches above to get the list.
Perhaps the interface to target_fetch_registers() should be changed to
include a vector containing all the registers to be fetched.

Stephane> The "gdb important registers" approach is interesting in
Stephane> that it gives good performance win and does not need to add
Stephane> the pre-fetch hints.

Many thanks for your comments.  I'm not ready to act on this yet, but
I'm better off now than when I was before.

        --jtc

-- 
J.T. Conklin
RedBack Networks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: additional vector function to improve  register fetch performance
  2000-12-14 16:59 ` Andrew Cagney
@ 2000-12-15 14:56   ` J.T. Conklin
  0 siblings, 0 replies; 5+ messages in thread
From: J.T. Conklin @ 2000-12-15 14:56 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb

>>>>> "Andrew" == Andrew Cagney <ac131313@cygnus.com> writes:
>> Might be changed to:
>> 
>> prefetch_sp ();
>> prefetch_fp ();
>> prefetch_pc ();
>> prefetch_register (R0_REGNUM);
>> sp = read_sp ();
>> fp = read_fp ();
>> pc = read_pc ();
>> r0 = read_register (R0_REGNUM);
>> 
>> (I'm assuming the prefetch* functions are added to regcache.c to do
>> whatever housekeeping is required and call the target vector
>> function).

Andrew> To follow on from Stephane's comments.  I have a feeling that
Andrew> this would make things unnecessarily complicated.
Andrew>
Andrew> Looking at the remote case and the ``T'' packet.  There isn't
Andrew> any reason why the ``T'' packet doesn't just return all of the
Andrew> registers.  Across a TCP connection that should have zero
Andrew> marginal cost.

Note that while I primarily use the remote protocol, I am looking at
this issue of register fetch performance more generically.

The problem is that the upper layer of GDB do not communicate with the
target layer in such that a target that can perform different types of
register fetches can select the optimum one.  

This is not an issue if the target can only fetch the entire register
set.  But I think it is desirable if a target can fetch a subset of
registers so that GDB would take advantage of that ability.

Imagine a processor with 32 4-byte integer registers, 32 4-byte float-
ing point registers, and a handful of control and other miscellaneous
registers.  If registers are ascii-hex encoded, the register data will
be ~1K bytes.  On a 9600bps link, assuming no other overhead, it will
takes over one second to receive.  If there was some way to fetch only
the registers that were needed, we'd be able to significantly improve
that time.  Even if the protocol is only able to fetch a single register,
assuming "reasonable" command latency GDB might be able to issue multiple
fetch register commands in less time.

The reason I suggested a cache model is that caches and prefetching
are already well understood concepts.  

Andrew> The other thing to consider is given a request for reg N, the
Andrew> target could satisfy it by fetching an entire block of
Andrew> registers M<=N<=O and entering them all into the cache.

How are M and O selected?  A target specific back end has knowledge
about the target and can select an appropriate block: Wind River's WDB
breaks things into integer registers, floating point registers,
control registers, etc. which works quite well in practice.  But I
don't know how something similar could be done with a generic target 
like remote.c

Andrew> In away the ``G'' (registers) packet does this - you ask for
Andrew> one register and get back a block.  Unfortunatly the block is
Andrew> very very large and fixed.

Agreed.  And as long as the remote protocol only requests full packets,
performance is reasonable.  But if you added support for a fetch single
register command to complement the existing store single register 
command, I think you would find that performance would suffer because 
GDB would fetch multiple single registers when a single register fetch 
would be optimal.  Today can see the same with single register stores in
some circumstances.  

I think we could recommend that single register stores only be
implemented for those targets where DECR_PC_AFTER_BREAK != 0.

        --jtc

-- 
J.T. Conklin
RedBack Networks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: additional vector function to improve  register fetch performance
  2000-12-07 14:17 J.T. Conklin
@ 2000-12-14 16:59 ` Andrew Cagney
  2000-12-15 14:56   ` J.T. Conklin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cagney @ 2000-12-14 16:59 UTC (permalink / raw)
  To: jtc; +Cc: gdb

"J.T. Conklin" wrote:

>   Might be changed to:
> 
>              prefetch_sp ();
>              prefetch_fp ();
>              prefetch_pc ();
>              prefetch_register (R0_REGNUM);
>              sp = read_sp ();
>              fp = read_fp ();
>              pc = read_pc ();
>              r0 = read_register (R0_REGNUM);
> 
>   (I'm assuming the prefetch* functions are added to regcache.c to do
>   whatever housekeeping is required and call the target vector
>   function).

To follow on from Stephane's comments.  I have a feeling that this would
make things unnecessarily complicated.

Looking at the remote case and the ``T'' packet.  There isn't any reason
why the ``T'' packet doesn't just return all of the registers.  Across a
TCP connection that should have zero marginal cost.

The other thing to consider is given a request for reg N, the target
could satisfy it by fetching an entire block of registers M<=N<=O and
entering them all into the cache.

In away the ``G'' (registers) packet does this - you ask for one
register and get back a block.  Unfortunatly the block is very very
large and fixed.

	Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* additional vector function to improve  register fetch performance
@ 2000-12-07 14:17 J.T. Conklin
  2000-12-14 16:59 ` Andrew Cagney
  0 siblings, 1 reply; 5+ messages in thread
From: J.T. Conklin @ 2000-12-07 14:17 UTC (permalink / raw)
  To: gdb

While we're talking about register fetch and stores, I had an idea the
outher night about improving performance by adding some new functions
to the target vector.

* target_prefetch_register()

  With the register cache and targets that always fetch the entire
  register set, fetch performance is as good as can be expected.  But
  with a target that can fetch one register at a time, GDB will issue
  multiple single register fetches.  Due to command/response latency,
  this has a significant performance impact.

  One way this could be addressed is to always fetch the entire
  register set.  The remote protocol is like this, while it can set a
  single register, there is no command to fetch one.  This approach
  may lose when the register set is large and the number of registers
  to be fetched is small; it may be possible to issue several single-
  register fetches in the time for one for the entire register set.

  Another is the proposed target_prefetch_register() vector function.
  All this does is do a hint that we'll need the value of a register
  sometime "soon".  A sequence like:

             sp = read_sp ();
             fp = read_fp ();
             pc = read_pc ();
             r0 = read_register (R0_REGNUM);

  Might be changed to:

             prefetch_sp ();
             prefetch_fp ();
             prefetch_pc ();
             prefetch_register (R0_REGNUM);
             sp = read_sp ();
             fp = read_fp ();
             pc = read_pc ();
             r0 = read_register (R0_REGNUM);

  (I'm assuming the prefetch* functions are added to regcache.c to do
  whatever housekeeping is required and call the target vector
  function).

  In a trival target, prefetch would do nothing.  In one that has a
  async protocol, it might start fetching those registers (a callback
  would install the value in the cache when the values were received).
  In one that could do single, or full register set fetches, it might
  defer fetching anything until the first "real" read was received.
  At that time it would decide whether what type of fetch is the most
  optimum to perform.

  The disadvantage of this is that there is no benefit if the prefetch
  hints aren't added.  The good thing is that it keeps the interface
  between the target independent and target specific parts of GDB
  reasonably clean.  For contrast, imagine of a target vector function
  that took a list of registers to read.  This (IMO) would be much
  more difficult to use effectively.

Thoughts?  I have some partially thought ideas on how to do the same
for register stores, but I'm going to wait until they've firmed up a
bit before sharing.

        --jtc

-- 
J.T. Conklin
RedBack Networks

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-12-15 14:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-08  2:53 additional vector function to improve register fetch performance Stephane Carrez
2000-12-11 15:21 ` J.T. Conklin
  -- strict thread matches above, loose matches on Subject: below --
2000-12-07 14:17 J.T. Conklin
2000-12-14 16:59 ` Andrew Cagney
2000-12-15 14:56   ` J.T. Conklin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).