Questions on XML register descriptions

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* Questions on XML register descriptions
@ 2023-12-26 22:42 H. Peter Anvin
  2023-12-27 11:28 ` Luis Machado
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2023-12-26 22:42 UTC (permalink / raw)
  To: gdb

Hi,

I have been looking at the XML register descriptions, and I have a 
couple of questions, as I'm trying to add gdb remote support to an 
emulator, which involves additional system-level registers as well as 
memory spaces.

1. Some registers can't be written to, and some registers may affect 
other registers. What, if anything, is the best way to handle that?

2. How do the G/g commands interact with the XML target descriptions? Do 
they apply specifically to group="general", or are there other rules?

3. Is there any concept at all of different address spaces in gdb? 
Alternatively, is there any way to tell gdb that it should communicate 
addresses wider than the pointer type of the base architecture (which 
would allow a Python helper script to define a convenience API.)

4. There doesn't seem to be a way to trigger a Python function when a 
connection is *established*, which is as far as I understand the point 
at which the XML register description is downloaded from the remote, and 
definitely the first point at which register values can be examined. Am 
I missing something obvious?

5. I presume the byte codes use for the Z commands are the same as the 
agent bytecode in Appendix F, even though the latter specifically seems 
to be referring to tracepoints?

Many thanks,

	-hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions on XML register descriptions
  2023-12-26 22:42 Questions on XML register descriptions H. Peter Anvin
@ 2023-12-27 11:28 ` Luis Machado
  2023-12-28 22:09   ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: Luis Machado @ 2023-12-27 11:28 UTC (permalink / raw)
  To: H. Peter Anvin, gdb

Hi,

On 12/26/23 22:42, H. Peter Anvin via Gdb wrote:
> Hi,
> 
> I have been looking at the XML register descriptions, and I have a couple of questions, as I'm trying to add gdb remote support to an emulator, which involves additional system-level registers as well as memory spaces.
> 
> 1. Some registers can't be written to, and some registers may affect other registers. What, if anything, is the best way to handle that?

There isn't a single right way. You could teach gdb about those registers explicitly (not ideal) or you could make the remote just ignore or error out when gdb tries to modify such a register.

As for side-effects (changing a register has an effect on a different register), this might be best implemented via p/P packets. With those gdb can write out one specific register and then it re-reads the rest of the registers.

In such a case, you get the opportunity to apply the side-effects to other registers in time for sending the updated register buffer block.

This is a bit odd with the g/G packets, as it gets confusing having to write all registers at once, and you might not know what values have changed or not.

> 
> 2. How do the G/g commands interact with the XML target descriptions? Do they apply specifically to group="general", or are there other rules?

My recollection of it is that G/g will request all of the reported XML register with non-zero sizes. If p/P is also supported, gdb will attempt to fetch things using g and write things using P.

I vaguely recall something about gdb not requesting some register in g, but I'm not sure if that's still a thing.

> 
> 3. Is there any concept at all of different address spaces in gdb? Alternatively, is there any way to tell gdb that it should communicate addresses wider than the pointer type of the base architecture (which would allow a Python helper script to define a convenience API.> 

Yes, but the architecture-specific / os-specific layers need to be taught about it. Ideally the compiler would also establish a way of communicating this information to debuggers via the generated DWARF info. See DW_AT_address_class in the DWARF standard.

Are we talking about short pointers here? Say, a 16-bit pointer in a 32-bit address space?

> 4. There doesn't seem to be a way to trigger a Python function when a connection is *established*, which is as far as I understand the point at which the XML register description is downloaded from the remote, and definitely the first point at which register values can be examined. Am I missing something obvious?
> 

The Python API is still growing. If this is needed, I'd make the case to the gdb community. Patches are always welcome.

> 5. I presume the byte codes use for the Z commands are the same as the agent bytecode in Appendix F, even though the latter specifically seems to be referring to tracepoints?

Yes, the byte codes used for evaluating breakpoint conditions is the same as the ones used for tracepoints. The documentation could make that a bit more explict.

> 
> Many thanks,
> 
>     -hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions on XML register descriptions
  2023-12-27 11:28 ` Luis Machado
@ 2023-12-28 22:09   ` H. Peter Anvin
  2023-12-29 13:41     ` Luis Machado
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2023-12-28 22:09 UTC (permalink / raw)
  To: Luis Machado, gdb

Hi and thanks for responding!

So more concrete background: the simulator is for a family of Z80 
systems developed between 1978 and 1983. Z80 of course has both memory 
and I/O spaces, and on top of that, the later models in the series (and 
even the first ones with add-ons) had to do paging tricks to deal with 
the limitations of a 16-bit address space.

This is not visible to the CPU, so sizeof(void *) == 2.

On 12/27/23 03:28, Luis Machado wrote:
>>
>> 1. Some registers can't be written to, and some registers may affect other registers. What, if anything, is the best way to handle that?
> 
> There isn't a single right way. You could teach gdb about those registers explicitly (not ideal) or you could make the remote just ignore or error out when gdb tries to modify such a register.
> 
> As for side-effects (changing a register has an effect on a different register), this might be best implemented via p/P packets. With those gdb can write out one specific register and then it re-reads the rest of the registers.
> 
> In such a case, you get the opportunity to apply the side-effects to other registers in time for sending the updated register buffer block.
> 
> This is a bit odd with the g/G packets, as it gets confusing having to write all registers at once, and you might not know what values have changed or not.
> 

This is of course true, but it is gdb that selects between gG and pP 
packet, not the remote (which is what I control.)

Now, what I *can* do is to limit the return size of the g packet. If 
that means gdb will use pP packets to access other registers if/when 
requested, then that should deal with the problem.

>>
>> 2. How do the G/g commands interact with the XML target descriptions? Do they apply specifically to group="general", or are there other rules?
> 
> My recollection of it is that G/g will request all of the reported XML register with non-zero sizes. If p/P is also supported, gdb will attempt to fetch things using g and write things using P.
> 

That solves a lot of problems, although I expect that there might still 
be issues with the register cache... but that is a second-order problem 
if only a bit annoying.

> I vaguely recall something about gdb not requesting some register in g, but I'm not sure if that's still a thing.

So gdb doesn't specify anything about what g is supposed to return. The 
documentation only says "general registers", which *sounds* like 
group="general", but I have no real idea.

The description of register groups say that "some register groups" have 
built-in meaning to gdb, but I cannot find anywhere *where* those groups 
and their associated meanings are described, except that "general", 
"float" and "vector" are described (but not their specific semantics), 
and "all" is mentioned elsewhere (the naming of which is kind of obvious.)

There is an option "save-restore" in the register description; it is 
unclear to me how it interacts with the gG packets, but would at least 
can avoid the problem of gdb trying to restore registers that cannot be 
restored or would be ambiguous.

>>
>> 3. Is there any concept at all of different address spaces in gdb? Alternatively, is there any way to tell gdb that it should communicate addresses wider than the pointer type of the base architecture (which would allow a Python helper script to define a convenience API.>
> 
> Yes, but the architecture-specific / os-specific layers need to be taught about it. Ideally the compiler would also establish a way of communicating this information to debuggers via the generated DWARF info. See DW_AT_address_class in the DWARF standard.
> 
> Are we talking about short pointers here? Say, a 16-bit pointer in a 32-bit address space?
> 

The compiler communicating it isn't essential here as I'm not trying to 
expose things that are visible to the compiler. What I want to do is to 
be able to use gdb to examine/access non-CPU-visible address spaces, 
like the physical RAM even when it is paged out, or the I/O address space.

The problem becomes that the type of the pointer is lost if it is simply 
converted to an integer, and casting it back to a pointer causes it to 
be truncated back to 16 bits.

I tried doing a Python script like this, but it ran into the above 
problem (plus the connection problem mentioned below):


#!/usr/bin/python

import gdb

class Addrspace (gdb.Function):
     """Convert an address or a pointer to the named address space."""
     def __init__ (self, name, base, mask):
         self.name = name
         self.mask = mask
         self.base = base
         arch = gdb.selected_inferior().architecture()
         bits = 8
         while (mask|base) >> bits:
             bits <<= 1
         self.itype = arch.integer_type(bits, False)
         super (Addrspace, self).__init__ (name)

     def invoke (self, val):
         type = val.type
         return ((val.cast(self.itype) & self.mask) \
  		+ self.base).cast(type)

_io = Addrspace ("io", 0x01000000, 0xffff)


>> 4. There doesn't seem to be a way to trigger a Python function when a connection is *established*, which is as far as I understand the point at which the XML register description is downloaded from the remote, and definitely the first point at which register values can be examined. Am I missing something obvious?
>>
> 
> The Python API is still growing. If this is needed, I'd make the case to the gdb community. Patches are always welcome.




>> 5. I presume the byte codes use for the Z commands are the same as the agent bytecode in Appendix F, even though the latter specifically seems to be referring to tracepoints?
> 
> Yes, the byte codes used for evaluating breakpoint conditions is the same as the ones used for tracepoints. The documentation could make that a bit more explict.
> 
>>
>> Many thanks,
>>
>>      -hpa
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions on XML register descriptions
  2023-12-28 22:09   ` H. Peter Anvin
@ 2023-12-29 13:41     ` Luis Machado
  2024-01-08  6:21       ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: Luis Machado @ 2023-12-29 13:41 UTC (permalink / raw)
  To: H. Peter Anvin, gdb

On 12/28/23 22:09, H. Peter Anvin wrote:
> Hi and thanks for responding!
> 
> So more concrete background: the simulator is for a family of Z80 systems developed between 1978 and 1983. Z80 of course has both memory and I/O spaces, and on top of that, the later models in the series (and even the first ones with add-ons) had to do paging tricks to deal with the limitations of a 16-bit address space.
> 
> This is not visible to the CPU, so sizeof(void *) == 2.
> 

Thanks for the info. In the case of Z80, it looks like gdb already has support for it, but I've never used it. You can check what's available in the gdb/z80-tdep.c file.

From skimming through it, I found the following:

  /* Number of bytes used for address:
      2 bytes for all Z80 family
      3 bytes for eZ80 CPUs operating in ADL mode */

So it looks like gdb knows at least *some* of it. I don't know how functional the port is at this point.

> On 12/27/23 03:28, Luis Machado wrote:
>>>
>>> 1. Some registers can't be written to, and some registers may affect other registers. What, if anything, is the best way to handle that?
>>
>> There isn't a single right way. You could teach gdb about those registers explicitly (not ideal) or you could make the remote just ignore or error out when gdb tries to modify such a register.
>>
>> As for side-effects (changing a register has an effect on a different register), this might be best implemented via p/P packets. With those gdb can write out one specific register and then it re-reads the rest of the registers.
>>
>> In such a case, you get the opportunity to apply the side-effects to other registers in time for sending the updated register buffer block.
>>
>> This is a bit odd with the g/G packets, as it gets confusing having to write all registers at once, and you might not know what values have changed or not.
>>
> 
> This is of course true, but it is gdb that selects between gG and pP packet, not the remote (which is what I control.)
> 

That's true, but if the remote advertises support for the p/P packets, I think things will work better. So you have at least some control over what gdb will do.

> Now, what I *can* do is to limit the return size of the g packet. If that means gdb will use pP packets to access other registers if/when requested, then that should deal with the problem.
> 

Ok, I went to check this again, as I had forgotten what gdb did. It looks like you're right. From the code comment about handling the g packet reply from the remote:

  /* If this is smaller than we guessed the 'g' packet would be,
     update our records.  A 'g' reply that doesn't include a register's
     value implies either that the register is not available, or that
     the 'p' packet must be used.  */

So it looks like you can force gdb to fallback to the p packet by making the g reply packet of minimum size.

>>>
>>> 2. How do the G/g commands interact with the XML target descriptions? Do they apply specifically to group="general", or are there other rules?
>>
>> My recollection of it is that G/g will request all of the reported XML register with non-zero sizes. If p/P is also supported, gdb will attempt to fetch things using g and write things using P.
>>
> 
> That solves a lot of problems, although I expect that there might still be issues with the register cache... but that is a second-order problem if only a bit annoying.
> 
>> I vaguely recall something about gdb not requesting some register in g, but I'm not sure if that's still a thing.
> 
> So gdb doesn't specify anything about what g is supposed to return. The documentation only says "general registers", which *sounds* like group="general", but I have no real idea.
> 
> The description of register groups say that "some register groups" have built-in meaning to gdb, but I cannot find anywhere *where* those groups and their associated meanings are described, except that "general", "float" and "vector" are described (but not their specific semantics), and "all" is mentioned elsewhere (the naming of which is kind of obvious.)

Yeah, sorry about that. The documentation can be a bit vague in some areas. Though not documented that way, from the code's side gdb lets the remote target choose what to return in the g packet. Of course the remote must honor the order of registers in the g packet, but otherwise it can decide how big/small to make the g packet reply.

This is useful in case you have many system registers. The g packet in those cases would get fairly big, and you don't want to keep sending system register contents back and forth if they are not important.

As for the groups, they mostly affect how gdb displays the registers. For instance, "info reg" will not show floating point/vector/system registers IIRC. I don't think it has an influence on the remote protocol.

> 
> There is an option "save-restore" in the register description; it is unclear to me how it interacts with the gG packets, but would at least can avoid the problem of gdb trying to restore registers that cannot be restored or would be ambiguous.
> 

save-restore controls whether gdb is required to save those registers in case it wants to make a manual function call (say, with a "call" command). In that case, gdb will save the register context before calling the function and will restore it afterwards.

If a register is not writable, it shouldn't have save-restore enabled.

>>>
>>> 3. Is there any concept at all of different address spaces in gdb? Alternatively, is there any way to tell gdb that it should communicate addresses wider than the pointer type of the base architecture (which would allow a Python helper script to define a convenience API.>
>>
>> Yes, but the architecture-specific / os-specific layers need to be taught about it. Ideally the compiler would also establish a way of communicating this information to debuggers via the generated DWARF info. See DW_AT_address_class in the DWARF standard.
>>
>> Are we talking about short pointers here? Say, a 16-bit pointer in a 32-bit address space?
>>
> 
> The compiler communicating it isn't essential here as I'm not trying to expose things that are visible to the compiler. What I want to do is to be able to use gdb to examine/access non-CPU-visible address spaces, like the physical RAM even when it is paged out, or the I/O address space.
> 
> The problem becomes that the type of the pointer is lost if it is simply converted to an integer, and casting it back to a pointer causes it to be truncated back to 16 bits.

Yeah, so this is the type of knowledge gdb must have in order to handle things correctly. Technically you could handle things correctly from the remote's side by inspecing the memory read/write packets, determining what the address space should be, what the pointer size might be and then going to fetch that information from wherever address space it belongs to.

> 
> I tried doing a Python script like this, but it ran into the above problem (plus the connection problem mentioned below):
> 
> 

What particular error do you get, out of curiosity?

> #!/usr/bin/python
> 
> import gdb
> 
> class Addrspace (gdb.Function):
>     """Convert an address or a pointer to the named address space."""
>     def __init__ (self, name, base, mask):
>         self.name = name
>         self.mask = mask
>         self.base = base
>         arch = gdb.selected_inferior().architecture()
>         bits = 8
>         while (mask|base) >> bits:
>             bits <<= 1
>         self.itype = arch.integer_type(bits, False)
>         super (Addrspace, self).__init__ (name)
> 
>     def invoke (self, val):
>         type = val.type
>         return ((val.cast(self.itype) & self.mask) \
>          + self.base).cast(type)
> 
> _io = Addrspace ("io", 0x01000000, 0xffff)
> 
> 
>>> 4. There doesn't seem to be a way to trigger a Python function when a connection is *established*, which is as far as I understand the point at which the XML register description is downloaded from the remote, and definitely the first point at which register values can be examined. Am I missing something obvious?
>>>
>>
>> The Python API is still growing. If this is needed, I'd make the case to the gdb community. Patches are always welcome.
> 
> 
> 
> 
>>> 5. I presume the byte codes use for the Z commands are the same as the agent bytecode in Appendix F, even though the latter specifically seems to be referring to tracepoints?
>>
>> Yes, the byte codes used for evaluating breakpoint conditions is the same as the ones used for tracepoints. The documentation could make that a bit more explict.
>>
>>>
>>> Many thanks,
>>>
>>>      -hpa
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions on XML register descriptions
  2023-12-29 13:41     ` Luis Machado
@ 2024-01-08  6:21       ` H. Peter Anvin
  0 siblings, 0 replies; 5+ messages in thread
From: H. Peter Anvin @ 2024-01-08  6:21 UTC (permalink / raw)
  To: Luis Machado, gdb

On 12/29/23 05:41, Luis Machado via Gdb wrote:
> 
> Ok, I went to check this again, as I had forgotten what gdb did. It looks like you're right. From the code comment about handling the g packet reply from the remote:
> 
>    /* If this is smaller than we guessed the 'g' packet would be,
>       update our records.  A 'g' reply that doesn't include a register's
>       value implies either that the register is not available, or that
>       the 'p' packet must be used.  */
> 
> So it looks like you can force gdb to fallback to the p packet by making the g reply packet of minimum size.
> 

That's great.

>>
>> The compiler communicating it isn't essential here as I'm not trying to expose things that are visible to the compiler. What I want to do is to be able to use gdb to examine/access non-CPU-visible address spaces, like the physical RAM even when it is paged out, or the I/O address space.
>>
>> The problem becomes that the type of the pointer is lost if it is simply converted to an integer, and casting it back to a pointer causes it to be truncated back to 16 bits.
> 
> Yeah, so this is the type of knowledge gdb must have in order to handle things correctly. Technically you could handle things correctly from the remote's side by inspecing the memory read/write packets, determining what the address space should be, what the pointer size might be and then going to fetch that information from wherever address space it belongs to.
> 
>>
>> I tried doing a Python script like this, but it ran into the above problem (plus the connection problem mentioned below):
>>
> 
> What particular error do you get, out of curiosity?
> 
>> #!/usr/bin/python
>>
>> import gdb
>>
>> class Addrspace (gdb.Function):
>>      """Convert an address or a pointer to the named address space."""
>>      def __init__ (self, name, base, mask):
>>          self.name = name
>>          self.mask = mask
>>          self.base = base
>>          arch = gdb.selected_inferior().architecture()
>>          bits = 8
>>          while (mask|base) >> bits:
>>              bits <<= 1
>>          self.itype = arch.integer_type(bits, False)
>>          super (Addrspace, self).__init__ (name)
>>
>>      def invoke (self, val):
>>          type = val.type
>>          return ((val.cast(self.itype) & self.mask) \
>>           + self.base).cast(type)
>>
>> _io = Addrspace ("io", 0x01000000, 0xffff)
>>

The problem is that when the address is cast back to a pointer, it is 
truncated to the pointer size.

Having thought about this some more I wonder if it is possible to create 
a new Type similar to a normal pointer but larger, and use that as the 
new type instead of val.type.

	-hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-08  6:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-26 22:42 Questions on XML register descriptions H. Peter Anvin
2023-12-27 11:28 ` Luis Machado
2023-12-28 22:09   ` H. Peter Anvin
2023-12-29 13:41     ` Luis Machado
2024-01-08  6:21       ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).