public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* Re: Split DWARF and rnglists, gcc vs clang
       [not found] ` <20201113001143.GA2654@wildebeest.org>
@ 2020-11-13 14:45   ` Simon Marchi
  2020-11-13 15:18     ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Marchi @ 2020-11-13 14:45 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: gdb, gcc, gdb-patches

On 2020-11-12 7:11 p.m., Mark Wielaard wrote:
> Hi Simon,
>
> On Thu, Nov 05, 2020 at 11:11:43PM -0500, Simon Marchi wrote:
>> I'm currently squashing some bugs related to .debug_rnglists in GDB, and
>> I happened to notice that clang and gcc do different things when
>> generating rnglists with split DWARF.  I'd like to know if the two
>> behaviors are acceptable, and therefore if we need to make GDB accept
>> both.  Or maybe one of them is not doing things correctly and would need
>> to be fixed.
>>
>> clang generates a .debug_rnglists.dwo section in the .dwo file.  Any
>> DW_FORM_rnglistx attribute in the DWO refers to that section.  That
>> section is not shared with any other object, so DW_AT_rnglists_base is
>> never involved for these attributes.  Note that there might still be a
>> DW_AT_rnglists_base on the DW_TAG_skeleton_unit, in the linked file,
>> used if the skeleton itself has an attribute of form DW_FORM_rnglistx.
>> This rnglist would be found in a .debug_rnglists section in the linked
>> file, shared with the other units of the linked file.
>>
>> gcc generates a single .debug_rnglists in the linked file and no
>> .debug_rnglists.dwo in the DWO files.  So when an attribute has form
>> DW_FORM_rnglistx in a DWO file, I presume we need to do the lookup in
>> the .debug_rnglists section in the linked file, using the
>> DW_AT_rnglists_base attribute found in the corresponding skeleton unit.
>> This looks vaguely similar to how it was done pre-DWARF 5, with
>> DW_AT_GNU_ranges base.
>>
>> So, is gcc wrong here?  I don't see anything in the DWARF 5 spec
>> prohibiting to do it like gcc does, but clang's way of doing it sounds
>> more in-line with the intent of what's described in the DWARF 5 spec.
>> So I wonder if it's maybe an oversight or a misunderstanding between the
>> two compilers.
>
> I think I would have asked the question the other way around :) The
> spec explicitly describes rnglists_base (and loclists_base) as a way
> to reference ranges (loclists) through the index table, so that the
> only relocation you need is in the (skeleton) DIE.

I presume you reference this non-normative text in section 2.17.3?

    This range list representation, the rnglist class, and the related
    DW_AT_rnglists_base attribute are new in DWARF Version 5. Together
    they eliminate most or all of the object language relocations
    previously needed for range lists.

What I understand from this is that the rnglist class and
DW_AT_rnglists_base attribute help reduce the number of relocations in
the non-split case (it removes the need for relocations from
DW_AT_ranges attribute values in .debug_info to .debug_rnglists).  I
don't understand it as saying anything about where to put the rnglist
data in the split-unit case.

> But the rnglists
> (loclists) themselves can still use relocations. A large part of them
> is non-shared addresses, so using indexes (into the .debug_addr
> addr_base) would simply be extra overhead. The relocations will
> disappear once linked, but the index tables won't.
>
> As an alternative, if you like to minimize the amount of debug data in
> the main object file, the spec also describes how to put a whole
> .debug_rnglists.dwo (or .debug_loclists.dwo) in the split dwarf
> file. Then you cannot use all entry encodings and do need to use an
> .debug_addr index to refer to any addresses in that case. So the
> relocations are still there, you just refer to them through an extra
> index indirection.
>
> So I believe both encodings are valid according to the spec. It just
> depends on what you are optimizing for, small main object file size or
> smallest encoding with least number of indirections.

So, if I understand correctly, gcc's way of doing things (putting all
the rnglists in a common .debug_rnglists section) reduces the overall
size of debug info since the rnglists can use the direct addressing
rnglists entries (e.g. DW_RLE_start_end) rather than the indirect ones
(e.g. DW_RLE_startx_endx).  But this come at the expense of a lot of
relocations in the rnglists themselves, since they refer to addresses
directly.

I thought that the main point of split-units was to reduce the number of
relocations processed by the linker and data moved around by the linker,
to reduce link time and provide a better edit-build-debug cycle.  Is
that the case?

Anyway, regardless of the intent, the spec should ideally be clear about
that so we don't have to guess.

> P.S. I am really interested in these interpretations of DWARF, but I
> don't really follow the gdb implementation details very much. Could we
> maybe move discussions like these from the -patches list to the main
> gdb (or gcc) mailinglist?

Sure, I added gdb@ and gcc@.  I also left gdb-patches@ so that it's
possible to follow the discussion there.

Simon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Split DWARF and rnglists, gcc vs clang
  2020-11-13 14:45   ` Split DWARF and rnglists, gcc vs clang Simon Marchi
@ 2020-11-13 15:18     ` Mark Wielaard
  2020-11-13 15:41       ` Simon Marchi
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Wielaard @ 2020-11-13 15:18 UTC (permalink / raw)
  To: Simon Marchi; +Cc: gdb, gcc, gdb-patches

Hi Simon,

On Fri, 2020-11-13 at 09:45 -0500, Simon Marchi wrote:
> I think I would have asked the question the other way around :) The
> > spec explicitly describes rnglists_base (and loclists_base) as a way
> > to reference ranges (loclists) through the index table, so that the
> > only relocation you need is in the (skeleton) DIE.
> 
> I presume you reference this non-normative text in section 2.17.3?
> 
>     This range list representation, the rnglist class, and the related
>     DW_AT_rnglists_base attribute are new in DWARF Version 5. Together
>     they eliminate most or all of the object language relocations
>     previously needed for range lists.

That too, but I was actually referring to the sections that define
Range List and Location List Tables (7.28 and 7.29) which define the
meaning of DW_AT_rnglists_base and DW_AT_loclists_base. But you could
also look at 3.1.3 Split Full Compilation Unit Entries which says that
those base attributes are inherited from the corresponding skeleton
compilation unit for a split unit.

> What I understand from this is that the rnglist class and
> DW_AT_rnglists_base attribute help reduce the number of relocations in
> the non-split case (it removes the need for relocations from
> DW_AT_ranges attribute values in .debug_info to .debug_rnglists).  I
> don't understand it as saying anything about where to put the rnglist
> data in the split-unit case.

I interpreted it as when there is a base attribute in the (skeleton)
unit, then the corresponding section (index table) can be found in the
main object file. At least that is how elfutils libdw interprets the
base attributes, not just for rnglists_base, but also str_offsets_base,
addr_base, etc. And that is also how/when GCC emits them.

> > So I believe both encodings are valid according to the spec. It just
> > depends on what you are optimizing for, small main object file size or
> > smallest encoding with least number of indirections.
> 
> So, if I understand correctly, gcc's way of doing things (putting all
> the rnglists in a common .debug_rnglists section) reduces the overall
> size of debug info since the rnglists can use the direct addressing
> rnglists entries (e.g. DW_RLE_start_end) rather than the indirect ones
> (e.g. DW_RLE_startx_endx).  But this come at the expense of a lot of
> relocations in the rnglists themselves, since they refer to addresses
> directly.

Yes, and it reduces the number of .debug_addr entries that need
relocations.

> I thought that the main point of split-units was to reduce the number of
> relocations processed by the linker and data moved around by the linker,
> to reduce link time and provide a better edit-build-debug cycle.  Is
> that the case?

I think it depends on who exactly you ask and what their specific
goals/setups are. Both things, reducing the number of relocations and
moving data out of the main object file, are independently useful in
different context. But I think it is mainly reducing the number of
relocations that is beneficial. For example clang (but not yet gcc)
supports having the .dwo sections themselves in the main object file
(using SHF_EXCLUDED for the .dwo sections, so the linker will still
skip them). Which is also a possibility that the spec describes and
which really makes split DWARF much more usable, because then you don't
need to change your build system to deal with multiple output files.

> > P.S. I am really interested in these interpretations of DWARF, but I
> > don't really follow the gdb implementation details very much. Could we
> > maybe move discussions like these from the -patches list to the main
> > gdb (or gcc) mailinglist?
> 
> Sure, I added gdb@ and gcc@.  I also left gdb-patches@ so that it's
> possible to follow the discussion there.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Split DWARF and rnglists, gcc vs clang
  2020-11-13 15:18     ` Mark Wielaard
@ 2020-11-13 15:41       ` Simon Marchi
  2020-11-13 18:34         ` Mark Wielaard
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Marchi @ 2020-11-13 15:41 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: gdb, gcc, gdb-patches

On 2020-11-13 10:18 a.m., Mark Wielaard wrote:
> That too, but I was actually referring to the sections that define
> Range List and Location List Tables (7.28 and 7.29) which define the
> meaning of DW_AT_rnglists_base and DW_AT_loclists_base. But you could
> also look at 3.1.3 Split Full Compilation Unit Entries which says that
> those base attributes are inherited from the corresponding skeleton
> compilation unit for a split unit.

Hmm, indeed, if we interpret that sentence in 3.1.3 to the letter, it
suggests that the the DW_FORM_rnglistx attributes in the DWO are meant
to point in the linked file's .debug_rnglists.  Otherwise, inheriting
DW_AT_rnglists_base wouldn't be meaningful.

But when DWO files use a .debug_rnglists.dwo section, it doesn't make
sense to consider the inherited DW_AT_rnglists_base.

So in the end the logical thing to do when encountering a
DW_FORM_rnglistx in a split-unit, in order to support everybody, is
probably to go to the .debug_rnglists.dwo section, if there's one,
disregarding the (inherited) DW_AT_rnglists_base.  If there isn't, then
try the linked file's .debug_rnglists section, using
DW_AT_rnglists_base.  If there isn't, then something is malformed.

>> What I understand from this is that the rnglist class and
>> DW_AT_rnglists_base attribute help reduce the number of relocations in
>> the non-split case (it removes the need for relocations from
>> DW_AT_ranges attribute values in .debug_info to .debug_rnglists).  I
>> don't understand it as saying anything about where to put the rnglist
>> data in the split-unit case.
>
> I interpreted it as when there is a base attribute in the (skeleton)
> unit, then the corresponding section (index table) can be found in the
> main object file.

That doesn't work with how clang produces it, AFAIU.  There is a
DW_AT_rnglists_base attribute in the skeleton and a .debug_rnglists in
the linked file, which is used for the skeleton's DW_AT_ranges
attribute.  And there is also .debug_rnglists.dwo sections in the DWO
files.  So DW_FORM_rnglistx values in the skeleton use the
.debug_rnglists in the linked file, while the DW_FORM_rnglistx values
in the DWO file use the .debug_rnglists.dwo in that file (even though
there is a DW_AT_rnglists_base in the skeleton).

> At least that is how elfutils libdw interprets the
> base attributes, not just for rnglists_base, but also str_offsets_base,
> addr_base, etc. And that is also how/when GCC emits them.
>
>>> So I believe both encodings are valid according to the spec. It just
>>> depends on what you are optimizing for, small main object file size or
>>> smallest encoding with least number of indirections.
>>
>> So, if I understand correctly, gcc's way of doing things (putting all
>> the rnglists in a common .debug_rnglists section) reduces the overall
>> size of debug info since the rnglists can use the direct addressing
>> rnglists entries (e.g. DW_RLE_start_end) rather than the indirect ones
>> (e.g. DW_RLE_startx_endx).  But this come at the expense of a lot of
>> relocations in the rnglists themselves, since they refer to addresses
>> directly.
>
> Yes, and it reduces the number of .debug_addr entries that need
> relocations.
>
>> I thought that the main point of split-units was to reduce the number of
>> relocations processed by the linker and data moved around by the linker,
>> to reduce link time and provide a better edit-build-debug cycle.  Is
>> that the case?
>
> I think it depends on who exactly you ask and what their specific
> goals/setups are. Both things, reducing the number of relocations and
> moving data out of the main object file, are independently useful in
> different context. But I think it is mainly reducing the number of
> relocations that is beneficial. For example clang (but not yet gcc)
> supports having the .dwo sections themselves in the main object file
> (using SHF_EXCLUDED for the .dwo sections, so the linker will still
> skip them). Which is also a possibility that the spec describes and
> which really makes split DWARF much more usable, because then you don't
> need to change your build system to deal with multiple output files.

Not sure I understand.  Does that mean that the .dwo sections are
emitted in the .o files, and that's the end of the road for them?  The
DW_AT_dwo_name attributes of the skeletons then refer to the .o files?

Simon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Split DWARF and rnglists, gcc vs clang
  2020-11-13 15:41       ` Simon Marchi
@ 2020-11-13 18:34         ` Mark Wielaard
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Wielaard @ 2020-11-13 18:34 UTC (permalink / raw)
  To: Simon Marchi; +Cc: gdb, gcc, gdb-patches

Hi Simon,

On Fri, 2020-11-13 at 10:41 -0500, Simon Marchi wrote:
> So in the end the logical thing to do when encountering a
> DW_FORM_rnglistx in a split-unit, in order to support everybody, is
> probably to go to the .debug_rnglists.dwo section, if there's one,
> disregarding the (inherited) DW_AT_rnglists_base.  If there isn't, then
> try the linked file's .debug_rnglists section, using
> DW_AT_rnglists_base.  If there isn't, then something is malformed.

Yes, I think that makes sense.

> > I interpreted it as when there is a base attribute in the (skeleton)
> > unit, then the corresponding section (index table) can be found in the
> > main object file.
> 
> That doesn't work with how clang produces it, AFAIU.  There is a
> DW_AT_rnglists_base attribute in the skeleton and a .debug_rnglists in
> the linked file, which is used for the skeleton's DW_AT_ranges
> attribute.  And there is also .debug_rnglists.dwo sections in the DWO
> files.  So DW_FORM_rnglistx values in the skeleton use the
> .debug_rnglists in the linked file, while the DW_FORM_rnglistx values
> in the DWO file use the .debug_rnglists.dwo in that file (even though
> there is a DW_AT_rnglists_base in the skeleton).

I would have expected the skeleton's DW_AT_ranges to use
DW_FORM_secoffset, not DW_FORM_rnglistx. Precisely because you would
then get an ambiguity. But it would indeed be good to handle that
situation.

> > I think it depends on who exactly you ask and what their specific
> > goals/setups are. Both things, reducing the number of relocations and
> > moving data out of the main object file, are independently useful in
> > different context. But I think it is mainly reducing the number of
> > relocations that is beneficial. For example clang (but not yet gcc)
> > supports having the .dwo sections themselves in the main object file
> > (using SHF_EXCLUDED for the .dwo sections, so the linker will still
> > skip them). Which is also a possibility that the spec describes and
> > which really makes split DWARF much more usable, because then you don't
> > need to change your build system to deal with multiple output files.
> 
> Not sure I understand.  Does that mean that the .dwo sections are
> emitted in the .o files, and that's the end of the road for them?  The
> DW_AT_dwo_name attributes of the skeletons then refer to the .o files?

Yes, precisely. I am not sure whether it is already in any released
clang, but if it is you could try -gsplit-dwarf=single to see an
example.

Note that elfutils libdw doesn't yet handle that variant. Luckily not
because of a design issue, but because there are some sanity checks
that trigger when seeing a .debug_xxx and .debug_xxx.dwo section in the
same file. I have a partial patch to fix that and make it so that you
can explicitly open a file as either a "main" Dwarf or "split" Dwarf.
The only thing it doesn't do yet is share the file handle between the
Dwarf object (which isn't strictly needed, but would be a nice
optimization).

I actually think having a "single" split-dwarf file (.o == .dwo) is the
best way to support Split Dwarf more generically because then it would
simply work without having to adjust all build systems to work
with/around separate .dwo files.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-13 18:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d2bd55b6-67fb-a04c-95d3-bae4a0c65ff5@polymtl.ca>
     [not found] ` <20201113001143.GA2654@wildebeest.org>
2020-11-13 14:45   ` Split DWARF and rnglists, gcc vs clang Simon Marchi
2020-11-13 15:18     ` Mark Wielaard
2020-11-13 15:41       ` Simon Marchi
2020-11-13 18:34         ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).