[RFC] TARGET_CHAR_BIT != HOST_CHAR

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT
@ 2003-05-29 23:22 Svein E. Seldal
  2003-06-01 18:13 ` Andrew Cagney
  0 siblings, 1 reply; 5+ messages in thread
From: Svein E. Seldal @ 2003-05-29 23:22 UTC (permalink / raw)
  To: gdb

Hi,

To be able to port the tic4x target, I am forced to take action towards 
supporting TARGET_CHAR_BIT != 8 and TARGET_CHAR_BIT != HOST_CHAR_BIT. 
(TARGET_CHAR_BIT is 32 on this specific target.)

First up for me is the load_section_callback() function in symfile.c - 
it handles the "load" command when doing remote debugging. It downloads 
x bytes to the target, and then increases the lma by x. Since 
TARGET_CHAR_BIT != HOST_CHAR_BIT this isnt correct.

The simplest way to fix this issue (more or less globally) would be to 
declare a macro or function, target_addr_increase_from_buffersize(), or 
something, that calculates the lma increase from x with the aid of 
TARGET_CHAR_BIT (and TARGET_HOST_BIT).

We need to make a decision of how to approach this issue. I will still 
keep porting the tic4x port in my local sandbox, and make this work for 
me. But as it requires me to make global adaptations, I would surely 
like to do this with the blessing of the community.

1) IMHO can assume that TARGET_CHAR_BIT and HOST_CHAR_BIT are multiplum 
of 8. I havn't seen _any_ targets yet that break with this rule.

2) How should we incorporate these macros/function in regards of the 
gdbarch model?  Maby we should make an own gdbarch "attribute" that the 
targets can define.

3) We need to hunt and track down any portions of the gdb code that has 
issues with this. I can think of several occations where this will be a 
problem: - All target pointer arithmetics on host, - buffers and 
structs, etc. Looking briefly at my old tic4x-gdb patch (pre 5.0), I'd 
estimate approx. 500 changes to global sources, in over 50 files.

Regards,
Svein

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT
  2003-05-29 23:22 [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT Svein E. Seldal
@ 2003-06-01 18:13 ` Andrew Cagney
  2003-06-02  2:22   ` Svein E. Seldal
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cagney @ 2003-06-01 18:13 UTC (permalink / raw)
  To: Svein E. Seldal; +Cc: gdb

> Hi,
> 
> To be able to port the tic4x target, I am forced to take action towards supporting TARGET_CHAR_BIT != 8 and TARGET_CHAR_BIT != HOST_CHAR_BIT. (TARGET_CHAR_BIT is 32 on this specific target.)
> 
> First up for me is the load_section_callback() function in symfile.c - it handles the "load" command when doing remote debugging. It downloads x bytes to the target, and then increases the lma by x. Since TARGET_CHAR_BIT != HOST_CHAR_BIT this isnt correct.

So the problem is defining how many host|target bytes are transfered by 
a specified length?  Or is the lenght in the host, or target space?

When you say TARGET_CHAR_BIT is 32, what exactly do you mean?  Is 32 
bits a fundamental limitation of the hardware or a data type selected 
for efficiency reasons?  Does debug info indicate that ``char'' is 8 
bits or more in size?  GDB uses TARGET_CHAR_BIT, on stabs, do decide the 
size of 'char'.

The d10v's data space is addressable down to an 8 bit boundary, but it's 
code space is addressable down to only 32 bits.  Both code and data 
pointers are mapped onto a single 8 bit addressable CORE_ADDR (see 
d10v-tdep.c pointer to address and address to pointer).

I suspect that what's been proposed here would [further] overload the 
already overloaded TARGET_CHAR_BIT.  Is something separate needed?

> The simplest way to fix this issue (more or less globally) would be to declare a macro or function, target_addr_increase_from_buffersize(), or something, that calculates the lma increase from x with the aid of TARGET_CHAR_BIT (and TARGET_HOST_BIT).

> We need to make a decision of how to approach this issue. I will still keep porting the tic4x port in my local sandbox, and make this work for me. But as it requires me to make global adaptations, I would surely like to do this with the blessing of the community.
> 
> 1) IMHO can assume that TARGET_CHAR_BIT and HOST_CHAR_BIT are multiplum of 8. I havn't seen _any_ targets yet that break with this rule.

Hopefully any hardware not complying with this has been switched off :-)

> 2) How should we incorporate these macros/function in regards of the gdbarch model?  Maby we should make an own gdbarch "attribute" that the targets can define.

The mechanism should always be present and should always be used.

> 3) We need to hunt and track down any portions of the gdb code that has issues with this. I can think of several occations where this will be a problem: - All target pointer arithmetics on host, - buffers and structs, etc. Looking briefly at my old tic4x-gdb patch (pre 5.0), I'd estimate approx. 500 changes to global sources, in over 50 files.

Keep in mind that this is so weird that the average programmer will 
always forget to use this mechanism.  Unless, somehow, it's made very 
natural.

Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT
  2003-06-01 18:13 ` Andrew Cagney
@ 2003-06-02  2:22   ` Svein E. Seldal
  2003-06-03 15:10     ` Andrew Cagney
  0 siblings, 1 reply; 5+ messages in thread
From: Svein E. Seldal @ 2003-06-02  2:22 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb

Andrew Cagney wrote:
> So the problem is defining how many host|target bytes are transfered by 
> a specified length?  Or is the lenght in the host, or target space?
> 
> The d10v's data space is addressable down to an 8 bit boundary, but it's 
> code space is addressable down to only 32 bits.  Both code and data 
> pointers are mapped onto a single 8 bit addressable CORE_ADDR (see 
> d10v-tdep.c pointer to address and address to pointer).

When gdb is about to download large amounts of data over the a remote 
interface, it will break it up into smaller packets. These packets (the 
'M' packets) hold the destination address as its first argument. The 
download of the first 'M' packets goes well, but the successive M's 
within that segment fails. GDB assumes that when it has downloaded n 
bytes, it should increase the lma address by n for the next packet.

The problem is that the tic4x target doesnt work this way. It has the 
following proerty: sizeof(char)=sizeof(short)=sizeof(int)=sizeof(long)=1 
*and* is able to hold 32-bits of information. The tic4x target has 
absolutely no conception about bytes, only a databus of 32-bit width. 
One increase in a datapointer increases the physical address by one, but 
still one address spans 32-bit. Thus to store the information for a 
particular address, you need 32-bits of storeage. e.g.

	char foo[2] = { 1, 2 };

Is located in memory like this:

0x1000: 0x00000001
0x1001: 0x00000002

So you see, if a segment contains 256 bytes, GDB still needs to download 
256 bytes to the target (that's obvious), but the address-span of those 
256 bytes is only 64 (on target). So any lma address increases must be 
divided by 4 to be correct on this target.

As for the d10v solution, the tic4x is similar to the code-space of this 
  target. You could implement gdb this way, but I think you'll soon wind 
up in the same troubles: A char is still 32-bit, not the hardcoded 
8-bit. All accesses to non-32-bit boundary addresses will be invalid. 
Absolutely all addresses coming from binutils/BFD must be ajusted, 
because they are 32-bit oriented, not byte-oriented...

...but still, I'll keep an open mind to this turning out to be an 
implementable solution.

> I suspect that what's been proposed here would [further] overload the 
> already overloaded TARGET_CHAR_BIT.  Is something separate needed?

No and yes. Yes, because TARGET_CHAR_BIT doesn't affect the packet 
download lma incrementing. And no, because there already exists a 
set_gdbarch_char_bit() setting. But its commented out, so its not in 
use. This function/setting is probably what we would need for this port, 
if we could define it this way: TARGET_CHAR_BIT means "the number of 
bits required to represent the information stored in one unique address".

So my suggestion is that we reintroduce this setting, and use a macro 
like this to replace the code where needed.

#define TARGET_LENGTH(n) (n) * HOST_CHAR_BITS / TARGET_CHAR_BITS

(only gdbarchified, of course)

If the default value of the set_gdbarch_char_bit() setting is 8, well 
then it wont matter for most targets, as they dont need to the define 
nor change it's value. And it works transparently for everyone.

Why has the set_gdbarch_char_bit() setting been disabled?

> Keep in mind that this is so weird that the average programmer will 
> always forget to use this mechanism.  Unless, somehow, it's made very 
> natural.

Yeah, I know this may sound weird to some programmers. But it isnt 
unusual in DSP'world, as they are usually word-oriented to align better 
with the information they are processing. I will still try to press on 
for this feature, as I know that other Texas Instruments processors (of 
which have gcc support) have the same propery.

Regards,
Svein

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT
  2003-06-02  2:22   ` Svein E. Seldal
@ 2003-06-03 15:10     ` Andrew Cagney
  2003-06-07 11:40       ` Svein E. Seldal
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cagney @ 2003-06-03 15:10 UTC (permalink / raw)
  To: Svein E. Seldal; +Cc: gdb

> When gdb is about to download large amounts of data over the a remote interface, it will break it up into smaller packets. These packets (the 'M' packets) hold the destination address as its first argument. The download of the first 'M' packets goes well, but the successive M's within that segment fails. GDB assumes that when it has downloaded n bytes, it should increase the lma address by n for the next packet.
> 
> The problem is that the tic4x target doesnt work this way. It has the following proerty: sizeof(char)=sizeof(short)=sizeof(int)=sizeof(long)=1 *and* is able to hold 32-bits of information. The tic4x target has absolutely no conception about bytes, only a databus of 32-bit width. One increase in a datapointer increases the physical address by one, but still one address spans 32-bit. Thus to store the information for a particular address, you need 32-bits of storeage. e.g.
> 
>     char foo[2] = { 1, 2 };
> 
> Is located in memory like this:
> 
> 0x1000: 0x00000001
> 0x1001: 0x00000002

There are two things at play here:

- the compilers decision on how to implement char

The original alpha, for instance, had 8 bit addressable pointers yet the 
hardware could only read/write 64 bit words.   Access to anything 
smaller than 64 bits was handled in software.  Having the tic4x do 
something similar (presumably with long pointers) is just a ``small 
matter of programming''.

- physical limitations of the hardware

This is the important one.  The data space pointers for this hardware 
identify 32 bit words, not 8 bit bytes.

> So you see, if a segment contains 256 bytes, GDB still needs to download 256 bytes to the target (that's obvious), but the address-span of those 256 bytes is only 64 (on target). So any lma address increases must be divided by 4 to be correct on this target.
> 
> As for the d10v solution, the tic4x is similar to the code-space of this  target. You could implement gdb this way, but I think you'll soon wind up in the same troubles: A char is still 32-bit, not the hardcoded 8-bit. All accesses to non-32-bit boundary addresses will be invalid. Absolutely all addresses coming from binutils/BFD must be ajusted, because they are 32-bit oriented, not byte-oriented...

I think this needs to be persued a bit more before being discarded.

>> I suspect that what's been proposed here would [further] overload the already overloaded TARGET_CHAR_BIT.  Is something separate needed?

> No and yes. Yes, because TARGET_CHAR_BIT doesn't affect the packet download lma incrementing. And no, because there already exists a set_gdbarch_char_bit() setting. But its commented out, so its not in use. This function/setting is probably what we would need for this port, if we could define it this way: TARGET_CHAR_BIT means "the number of bits required to represent the information stored in one unique address". 

To expand my point.  TARGET_CHAR_BIT is used to identify:

- bitsizeof (char)
- the implied address alignment
- anything else such as debug info?

Those two are, as I noted above, orthogonal.  The problem, I think, is 
that GDB has used them interchangably.

To address this I can see two models.

- assume an 8 bit host byte size (aka bfd_byte)

This is effectively what GDB does now.  It, via pointer_to_address, maps 
a target pointer onto a cannonical CORE_ADDR.  For your architecture, a 
read of the word pointed at by 0x1000 would be converted into a read of 
four 8 bit bytes bytes at 0x4000.

- use the target byte size

And have any memory manipulations try to remember which (host or target) 
is used for any length computations.

I have a feeling that the first will be much easier.  All, in theory, 
that is needed is for this target to implement a pointer_to_address that 
does the above manipulation (and then stop GDB trying to use 
TARGET_CHAR_BIT when moving memory around).

I've also got reservations over making the semantics of memory transfer 
operations architecture dependant.  I think memory transfers should be 
defined in an architecture independant way.

Anyway, can you try setting up pointer_to_address and see what happens.

The problem with this approach is that GDB's CORE_ADDRs become visible 
to the user vis:

> (gdb) print/x $pc
> $7 = 0x10140b8

That's the PC as a GDB CORE_ADDR.

> (gdb) print/x (int)$pc
> $8 = 0x502e

Where as that's the actual pointer value.

> (gdb) x/i $pc
> 0x10140b8 <main+20>:    ld      r0, @r11        ||      nop
> (gdb) x/4b $pc
> 0x10140b8 <main+20>:    0x30    0x0b    0x5e    0x00

In both cases an examine works as expected.  Note that x/b examines an 8 
bit byte and not a 16 bit instruction word.

> (gdb) x/4b (@code *)0x502e
> A syntax error in expression, near `*)0x502e'.

Hmm, it would be better if that worked.  Would save the need to do:

> (gdb) x/4b (@code void *)0x502e
> 0x10140b8 <main+20>:    0x30    0x0b    0x5e    0x00

But note that this code pointer is very different to:

> (gdb) x/4b (char *)0x502e
> 0x200502e:      0x00    0x00    0x00    0x00

which created a pointer into the data space.

I should note that having CORE_ADDR visible is a blessing in disguise. 
It makes operations such as x/b meaningful.

Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT
  2003-06-03 15:10     ` Andrew Cagney
@ 2003-06-07 11:40       ` Svein E. Seldal
  0 siblings, 0 replies; 5+ messages in thread
From: Svein E. Seldal @ 2003-06-07 11:40 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb

Andrew Cagney wrote:
> There are two things at play here:
> 
> - the compilers decision on how to implement char
> 
> The original alpha, for instance, had 8 bit addressable pointers yet the 
> hardware could only read/write 64 bit words.   Access to anything 
> smaller than 64 bits was handled in software.  Having the tic4x do 
> something similar (presumably with long pointers) is just a ``small 
> matter of programming''.

The difference between the alpha and the tic4x, is the fact that a 
pointer can only read 32-bits nothing more, nothing less. An address 
points to a 32-bit location, never bytes.

> - physical limitations of the hardware
> 
> This is the important one.  The data space pointers for this hardware 
> identify 32 bit words, not 8 bit bytes.

Not just data pointers. All kind of pointers behaves this way.

> To expand my point.  TARGET_CHAR_BIT is used to identify:
> 
> - bitsizeof (char)
> - the implied address alignment
> - anything else such as debug info?

If you are to apply all these conditions to the tic4x target, you'll 
wind up with a TARGET_CHAR_BIT of 32.

> Those two are, as I noted above, orthogonal.  The problem, I think, is 
> that GDB has used them interchangably
> 
> To address this I can see two models.
> 
> - assume an 8 bit host byte size (aka bfd_byte)
> 
> This is effectively what GDB does now.  It, via pointer_to_address, maps 
> a target pointer onto a cannonical CORE_ADDR.  For your architecture, a 
> read of the word pointed at by 0x1000 would be converted into a read of 
> four 8 bit bytes bytes at 0x4000.
> 
> - use the target byte size
> 
> And have any memory manipulations try to remember which (host or target) 
> is used for any length computations.
> 
> I have a feeling that the first will be much easier.  All, in theory, 
> that is needed is for this target to implement a pointer_to_address that 
> does the above manipulation (and then stop GDB trying to use 
> TARGET_CHAR_BIT when moving memory around).

I agree that the first method could be a way of solving this issue. But 
I have the impression that this will be a hack. To implement the tic4x 
target with byte-addresses in gdb, is something that is solely made for 
satisfying gdb! The target _never_ operates with any of these numbers. 
Please remember that the pointers coming from the BFD still are in the 
target format.

> I've also got reservations over making the semantics of memory transfer 
> operations architecture dependant.  I think memory transfers should be 
> defined in an architecture independant way.

Even if this is made though a gdbarch function that defaultly works like 
it does today?

> The problem with this approach is that GDB's CORE_ADDRs become visible 
> to the user vis:

This will certainly confuse the user. Because these numbers are never 
present on the target.

> 
>> (gdb) print/x $pc
>> $7 = 0x10140b8
> 
> 
> That's the PC as a GDB CORE_ADDR.
> 
>> (gdb) print/x (int)$pc
>> $8 = 0x502e
> 
> 
> Where as that's the actual pointer value.
> 
>> (gdb) x/i $pc
>> 0x10140b8 <main+20>:    ld      r0, @r11        ||      nop
>> (gdb) x/4b $pc
>> 0x10140b8 <main+20>:    0x30    0x0b    0x5e    0x00
> 
> 
> In both cases an examine works as expected.  Note that x/b examines an 8 
> bit byte and not a 16 bit instruction word.
> 
>> (gdb) x/4b (@code *)0x502e
>> A syntax error in expression, near `*)0x502e'.
> 
> 
> Hmm, it would be better if that worked.  Would save the need to do:
> 
>> (gdb) x/4b (@code void *)0x502e
>> 0x10140b8 <main+20>:    0x30    0x0b    0x5e    0x00
> 
> 
> But note that this code pointer is very different to:
> 
>> (gdb) x/4b (char *)0x502e
>> 0x200502e:      0x00    0x00    0x00    0x00
> 
> 
> which created a pointer into the data space.

To me it seems like the internal gdb address is too visible to the user. 
Since this internal gdb address thing only present for satisfying the 
gdb implmentation, the user should _never_ be confronted with these 
hacky host-gdb-addresses.

1)
(gdb) x/4b (@code void *)0x502e
0x10140b8 <main+20>:    0x30    0x0b    0x5e    0x00

IMHO the only correct answer for this target is

0x502e <main+20>:   0x300b5300

2)
How does gdb handle this:
	struct buffer {
		char part1;
		char part2;
	};

Both of these members will be allocated a one-address location, but with 
the length of 32-bits. For the target: sizeof(struct buffer) == 2.

3)
What happens if a user requests a datadump of a length of 4? In our 
definition of a char, a dump of "x/4b", should dump 4x 32-bits of 
information.

The user will also expect this, as when he compiles "char buffer[10];" 
it will be implemented at 10x 32-bits memory locations. And when he 
issues "x/10b"  (10 because he wants 10 number, and b because he knows 
that the buffer is char) we will expecat a dump of the entire buffer.

 > I should note that having CORE_ADDR visible is a blessing in disguise.
 > It makes operations such as x/b meaningful.

What do you mean?

Regards,
Svein

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-06-07 11:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-29 23:22 [RFC] TARGET_CHAR_BIT != HOST_CHAR_BIT Svein E. Seldal
2003-06-01 18:13 ` Andrew Cagney
2003-06-02  2:22   ` Svein E. Seldal
2003-06-03 15:10     ` Andrew Cagney
2003-06-07 11:40       ` Svein E. Seldal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).