public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* gdb symbol relocation
@ 2022-04-04 14:14 Metzger, Markus T
  2022-04-04 15:36 ` Tom Tromey
  0 siblings, 1 reply; 5+ messages in thread
From: Metzger, Markus T @ 2022-04-04 14:14 UTC (permalink / raw)
  To: Tom Tromey, Pedro Alves; +Cc: gdb-patches

Hello,

While debugging some issue with symbol lookup I came across some oddities that I fail to understand.  I hope you can help me understand this better.

For every symbol GDB stores, among other things, some 'value' and the 'section index'.

For minimal symbols, the value appears to be the section-relative address.  I.e. MSYMBOL_VALUE_ADDRESS (objf, minsym) expands to get_msymbol_address (objf, minsym), which returns

  return (minsym->value.address
              + objf->section_offsets[minsym->section_index ()]);

For normal/debug info symbols, however, we seem to store the absolute/relocated address.  I.e. SYMBOL_VALUE_ADDRESS (sym) expands to get_symbol_address (sym), which returns

  return sym->value.address;

When creating the symbol for a variable with static storage duration, we do (in gdb/dwarf2/read.c: var_decode_location)

              if (block->data[0] == DW_OP_addr)
                SET_SYMBOL_VALUE_ADDRESS
                  (sym, cu->header.read_address (objfile->obfd,
                                                                 block->data + 1,
                                                                 &dummy));
              else
                SET_SYMBOL_VALUE_ADDRESS
                  (sym, read_addr_index_from_leb128 (cu, block->data + 1,
                                                                        &dummy));
              sym->set_aclass_index (LOC_STATIC);
              fixup_symbol_section (sym, objfile);
              SET_SYMBOL_VALUE_ADDRESS
                (sym,
                 SYMBOL_VALUE_ADDRESS (sym)
                 + objfile->section_offsets[sym->section_index ()]);

We read the object-file-relative address from the debug info, then fill
in the  symbol's section index, only to update the address to the full, relocated
address.

In lookup_minimal_symbol_by_pc_name (), which gets called from fixup_symbol_section () via fixup_section (), we compare the relocated address of the minimal symbol against the object-file-relative address of the symbol under construction

                  if (MSYMBOL_VALUE_ADDRESS (objfile, msymbol) == pc
                          && strcmp (msymbol->linkage_name (), name) == 0)
                        return msymbol;

The only way this can match is for absolute object files.  So we instead walk over the section table header and come up with the same result as the minimal symbol.  Should we compare the section-/object-file-relative addresses, instead?  Or omit the minsymbol search and look straight at the section table in fixup_section ()?

I admit I only looked at this one case, so far, so maybe it makes more sense in the general case.  But then, we'd only want to fixup a symbol's section if we don't know it, yet, so we'd almost always start with a relative address (except in the absolute object file case), which shouldn't match any relocated minsymbol address.

Why are minimal symbol values section-relative whereas normal symbol values are absolute?

Why section-relative and not segment-relative?  The section header table is optional and (debug symbols) relocation happens on segment level.

Thanks,
Markus.
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gdb symbol relocation
  2022-04-04 14:14 gdb symbol relocation Metzger, Markus T
@ 2022-04-04 15:36 ` Tom Tromey
  2022-04-04 15:43   ` Pedro Alves
  0 siblings, 1 reply; 5+ messages in thread
From: Tom Tromey @ 2022-04-04 15:36 UTC (permalink / raw)
  To: Metzger, Markus T via Gdb-patches
  Cc: Tom Tromey, Pedro Alves, Metzger, Markus T

> In lookup_minimal_symbol_by_pc_name (), which gets called from
> fixup_symbol_section () via fixup_section (), we compare the relocated
> address of the minimal symbol against the object-file-relative address
> of the symbol under construction

>                   if (MSYMBOL_VALUE_ADDRESS (objfile, msymbol) == pc
>                           && strcmp (msymbol->linkage_name (), name) == 0)
>                         return msymbol;

> The only way this can match is for absolute object files.  So we
> instead walk over the section table header and come up with the same
> result as the minimal symbol.  Should we compare the
> section-/object-file-relative addresses, instead?  Or omit the
> minsymbol search and look straight at the section table in
> fixup_section ()?

That sure seems like an oversight in this code, but I'd imagine that
changing lookup_minimal_symbol_by_pc_name would have other undesired
effects.

Maybe that check could just be removed from fixup_section, though.

I wonder if fixup_symbol_section is even useful any more, or if it can
be done in some better way.

> Why are minimal symbol values section-relative whereas normal symbol
> values are absolute?

Historically all symbols in gdb were absolute.  However, absolute
addresses are bad -- they prevent sharing symbols across different
objfiles, which isn't common in the single-inferior mode (though the
dlmopen work somewhat changes this), but is pretty common in the
multi-inferior world.

Over time we changed this.  First minsyms were changed, then later
psymbols.  These changes make it possible to share these symbols, which
is a performance win.

The remaining things -- blocks, linetables, and full symbols -- aren't
done simply because no one has done so.  Well, once ages ago I fixed up
linetables, but I never pushed that patch.  It could be resurrected
though.

I've tried to make this change for symbols and blocks various times over
the years, but it's simply been too big, and I haven't really wanted to
land changes incrementally, since I've never been sure I would really
finish the job.  Maybe I ought to revisit this decision.  This was
called "objfile splitting", see
https://sourceware.org/gdb/wiki/ObjfileSplitting

> Why section-relative and not segment-relative?  The section header
> table is optional and (debug symbols) relocation happens on segment
> level.

I have wondered this too but the answer is probably lost, so from my
perspective it is just "gdb has always worked this way".  Changing this
would be fine.

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gdb symbol relocation
  2022-04-04 15:36 ` Tom Tromey
@ 2022-04-04 15:43   ` Pedro Alves
  2022-04-05  6:03     ` Metzger, Markus T
  0 siblings, 1 reply; 5+ messages in thread
From: Pedro Alves @ 2022-04-04 15:43 UTC (permalink / raw)
  To: Tom Tromey, Metzger, Markus T via Gdb-patches

On 2022-04-04 16:36, Tom Tromey wrote:
>> Why section-relative and not segment-relative?  The section header
>> table is optional and (debug symbols) relocation happens on segment
>> level.
> I have wondered this too but the answer is probably lost, so from my
> perspective it is just "gdb has always worked this way".  Changing this
> would be fine.

Not sure how that would work.  There are targets whose shared objects are
relocatable objects with no segments at all.  You can even have that
on Linux by loading .o files in gdb with "file" or "add-symbol-file",
like gdb.base/relocate.exp.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: gdb symbol relocation
  2022-04-04 15:43   ` Pedro Alves
@ 2022-04-05  6:03     ` Metzger, Markus T
  2022-04-05 10:00       ` Pedro Alves
  0 siblings, 1 reply; 5+ messages in thread
From: Metzger, Markus T @ 2022-04-05  6:03 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Metzger, Markus T via Gdb-patches, Tom Tromey

Hello Pedro,

>>> Why section-relative and not segment-relative?  The section header
>>> table is optional and (debug symbols) relocation happens on segment
>>> level.
>> I have wondered this too but the answer is probably lost, so from my
>> perspective it is just "gdb has always worked this way".  Changing this
>> would be fine.
>
>Not sure how that would work.  There are targets whose shared objects are
>relocatable objects with no segments at all.  You can even have that
>on Linux by loading .o files in gdb with "file" or "add-symbol-file",
>like gdb.base/relocate.exp.

Those targets may want to assume segments for each of their sections
or otherwise model how loading relocatable files works on their target.

The ELF loading  view is the program header table.  Assuming no inter-segment
relative references, segments may be relocated by the loader independently
from each other but they'd always get loaded segment by segment.

Also, the debug information would be linked to segment virtual addresses
given that sections don't really have any.  They'd get one indirectly via the
segment that contains them (in theory, a section could even be located in
more than one segment, which, in turn, get loaded at different offsets).

One could probably break it down to sections (ignoring theoretical exceptions)
but the terminology just doesn't feel right to me.  Also, there need not even
be a section header table so in that case one would probably invent sections
based on the program header table.  Which feels rather odd to me.

In practice, it probably doesn't matter, but it makes it harder to understand
how GDB works IMHO.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gdb symbol relocation
  2022-04-05  6:03     ` Metzger, Markus T
@ 2022-04-05 10:00       ` Pedro Alves
  0 siblings, 0 replies; 5+ messages in thread
From: Pedro Alves @ 2022-04-05 10:00 UTC (permalink / raw)
  To: Metzger, Markus T; +Cc: Metzger, Markus T via Gdb-patches, Tom Tromey

On 2022-04-05 07:03, Metzger, Markus T wrote:
> Hello Pedro,
> 
>>>> Why section-relative and not segment-relative?  The section header
>>>> table is optional and (debug symbols) relocation happens on segment
>>>> level.
>>> I have wondered this too but the answer is probably lost, so from my
>>> perspective it is just "gdb has always worked this way".  Changing this
>>> would be fine.
>>
>> Not sure how that would work.  There are targets whose shared objects are
>> relocatable objects with no segments at all.  You can even have that
>> on Linux by loading .o files in gdb with "file" or "add-symbol-file",
>> like gdb.base/relocate.exp.
> 
> Those targets may want to assume segments for each of their sections
> or otherwise model how loading relocatable files works on their target.

Note such targets don't have custom GDB code.  They're typically RTOS that
use some generic arm-elf or something like that gdb, and use solib-target.c (it's the
default) and report a library list xml with <section/> elements instead
of <segment/> elements.

"For the common case of libraries that are fully linked binaries, the
library should have a list of segments.  If the target supports
dynamic linking of a relocatable object file, its library XML element
should instead include a list of allocated sections."

(see solib_target_relocate_section_addresses)


> 
> The ELF loading  view is the program header table.  Assuming no inter-segment
> relative references, segments may be relocated by the loader independently
> from each other but they'd always get loaded segment by segment.

Sure.  Though I think we can think of sections having load offsets themselves as simply
that the segment offsets having been flattened into the section representation.  Like,
one layer of indirection was eliminated.

Also keep in mind that GDB was first released in 1986, while ELF appeared in 1999.
There's also a.out, COFF, XCOFF, etc.

> 
> Also, the debug information would be linked to segment virtual addresses
> given that sections don't really have any.  They'd get one indirectly via the
> segment that contains them (in theory, a section could even be located in
> more than one segment, which, in turn, get loaded at different offsets).

I don't dispute what you're saying.  

> 
> One could probably break it down to sections (ignoring theoretical exceptions)
> but the terminology just doesn't feel right to me.  Also, there need not even
> be a section header table so in that case one would probably invent sections
> based on the program header table.  Which feels rather odd to me.
> 
> In practice, it probably doesn't matter, but it makes it harder to understand
> how GDB works IMHO.

I think that indeed, in practice it doesn't matter.  I mean, I don't recall it ever
causing a big problem.  Even though sections headers can be stripped, if you're
debugging at a symbolic level, then you must have (debug) sections.

I have a feeling it all started with bfd -- bfd itself is very section centric.

It's also interesting that LLDB also follows GDB's model, in that addresses are
section-relative throughout, and it's sections that have offsets and are slid/relocated.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-05 10:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 14:14 gdb symbol relocation Metzger, Markus T
2022-04-04 15:36 ` Tom Tromey
2022-04-04 15:43   ` Pedro Alves
2022-04-05  6:03     ` Metzger, Markus T
2022-04-05 10:00       ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).