From mboxrd@z Thu Jan 1 00:00:00 1970 From: jtc@redback.com (J.T. Conklin) To: Stephane Carrez Cc: gdb@sourceware.cygnus.com Subject: Re: additional vector function to improve register fetch performance Date: Mon, 11 Dec 2000 15:21:00 -0000 Message-id: <5mhf4apmry.fsf@jtc.redback.com> References: <200012081053.LAA22829@sunchorus.France.Sun.COM> X-SW-Source: 2000-12/msg00066.html >>>>> "Stephane" == Stephane Carrez writes: >> While we're talking about register fetch and stores, I had an idea the >> outher night about improving performance by adding some new functions >> to the target vector. Stephane> For gdb for ChorusOS (system debug), I'm doing some prefetch Stephane> to improve performance. For each processor, I've defined a Stephane> list of "gdb important registers". I retreive the complete Stephane> list when gdb asks for one of them. Stephane> Stephane> For example for Sparc, I've defined the following list: Stephane> Stephane> sp, pc, rp, fp Stephane> Stephane> So, when gdb asks for either sp or pc, I retreive these 4 Stephane> registers. Stephane> Stephane> In general, I've observed this is enought. For most Stephane> processors Gdb needs the sp, pc and fp (in general after Stephane> each stop). I'm using this for sparc, x86 and ppc. I know well that providing a handful of registers can dramatically improve performance. The GDB remote protocol has a mechanism such that that the values of key registers can be returned along with the exception/signal number whenever execution stops. Long ago I submitted a patch to add the pc, sp, and fp to the sample i386 and m68k stubs. I don't know what ever happened to it, I should probably dust it off and submit it again. I'm not sure that your suggestion to prefetch all "gdb important registers" when any in that set are read is generally applicable. While this kind of optimization can be done in the target layer reasonably easily for those GDB targets that are only used by one target system, it is much more difficult for GDB targets that are "generic" like remote.c. One problem is that this breaks the abstraction layer between target- independent and target specific code. I admit that this is a fuzzy line, but having remote.c (for example) know about "important regs" clearly breaks it (IMO). Another problem is that the target layer doesn't know what registers are already stored in the regcache. This would require additions to the regcache API so it could be queried so that only those registers that needed to be fetched would be. Finally, this would have to be added to more than one target. IMO, this makes it clear that a mechanism for handling "important regs" in the target layer is flawed. If the idea is pursued, it belongs above the target layer. This would make a fetch of any of the "important regs" a prefetch for all of the others in the set. This could be implemented reasonably cleanly, and does not suffer from the problem you alluded elsewhere in your response that you don't know when a register is going to be needed. Adding a bunch of explicit pre- fetch hints may not help if you miss one critical one. But I still think that there needs to be some way for GDB to tell the target layer that more than one register is going to be fetched. It won't matter for those targets that fetch all registers at once or those that can only fetch one at a time, but those that can fetch a arbitrary set (or even those that can both fetch all and fetch one) can't take advantage of that unless it knows what registers will be fetched. Otherwise the latency of a full register set fetch or multiple single register fetches is still a problem. >> * target_prefetch_register() >> >> With the register cache and targets that always fetch the entire >> register set, fetch performance is as good as can be expected. But >> with a target that can fetch one register at a time, GDB will issue >> multiple single register fetches. Due to command/response latency, >> this has a significant performance impact. >> >> One way this could be addressed is to always fetch the entire >> register set. The remote protocol is like this, while it can set a >> single register, there is no command to fetch one. This approach >> may lose when the register set is large and the number of registers >> to be fetched is small; it may be possible to issue several single- >> register fetches in the time for one for the entire register set. Stephane> Fetching all registers is a killer for PPC... In general, Stephane> only the pc, lr and sp are used by Gdb (ok, except for Stephane> arguments/locals). This would be a problem even if there the remote protocol supported a read-single register command, since GDB would fetch for each register sequentially. Whether to fetch the entire set or a fetch individual registers depends on many factors: the number of registers, the size of the register set, the bandwidth and latency of the debug channel. Fortunately, as I mentioned earlier, the remote protocol allows for register values to be returned when execution stops. Before I added support to our powerpc stub, "step", "next", etc. were painful. To be perfect, GDB needs to be able to take advantage of a step-out-of- range command; but that's a different project. Stephane> I like the idea of pre-fetching but I wouldn't introduce a Stephane> new target vector for that. The 'prefetch_reg' is somewhat Stephane> generic. We just have to keep a list of registers that will, Stephane> soon, be required. Stephane> Stephane> Then, when we really need a register, the Stephane> target_fetch_register() can look at the prefetch list that Stephane> was built. It can then retrieve all of them depending on the Stephane> remote protocol. Until this moment, I've assumed that your implementation was in the target layer. But I've just reread your message and noticed that you didn't say that. In fact, it appears we've come to many of the same conclusions. One difference is that in my scheme, the target layer is explicitly given prefetch hints and in yours the target layer accesses the hints that are maintained above. I'm not especially fond of either. In my scheme, each target must maintain its own infrastructure for recording the prefetch list. In yours, the target reaches above to get the list. Perhaps the interface to target_fetch_registers() should be changed to include a vector containing all the registers to be fetched. Stephane> The "gdb important registers" approach is interesting in Stephane> that it gives good performance win and does not need to add Stephane> the pre-fetch hints. Many thanks for your comments. I'm not ready to act on this yet, but I'm better off now than when I was before. --jtc -- J.T. Conklin RedBack Networks