public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* linker debug info editing
@ 2006-03-10 12:50 Alan Modra
  2006-03-10 19:50 ` James E Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Alan Modra @ 2006-03-10 12:50 UTC (permalink / raw)
  To: binutils, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

This is a first pass at debug info editing to remove bogus entries for
link-once functions.  I'm not going to commit it just yet as the only
testing I've done is linking a current libstdc++ and examining the
results with readelf, and there are some as yet unresolved issues.

I'm posting now because I'd like some feedback/review, particularly on
the overall design assumptions as detailed in the following comment.
Another thing I'm unsure of at this point is whether I should remove
the CU header when all following info is removed in .debug_info,
.debug_line, and .debug_aranges sections.  I know one consumer (readelf)
became confused when .debug_aranges had an empty unit.  Perhaps other
consumers expect some relationship between the number of CUs in
different debug sections?

One known issue is that .debug_pubnames and .debug_pubtypes have a
debug_info_length field, and I guess I need to edit this when shrinking
.debug_info.

/* This file implements removal of DIEs for discarded link-once sections.
   Such DIEs are detected by parsing debug sections to find DIE boundaries,
   then examining relocations for each DIE.  Any DIE containing a field
   with a relocation against a symbol in a discarded section is marked for
   removal.  A removed DIE in .debug_info results in all of its children
   and associated data in .debug_loc and .debug_ranges being removed too.
   A removed CIE in .debug_frame results in all FDEs using that CIE
   being removed, but no attempt is made to remove usused CIEs (as can
   happen when all of a CIE's FDEs are removed).  Likewise, other unused
   shared info, ie. .debug_abbrev and .debug_str entries, are not removed.

   We assume that
   a) debug section relocations are sorted by r_offset,
   b) .debug_info location lists (references to .debug_loc) occur
      in increasing order of offset into .debug_loc,
   c) .debug_info ranges (references to .debug_ranges) occur
      in increasing order of offset into .debug_ranges,
   d) entries in .debug_loc and .debug_ranges are not shared,
   e) FDEs always occur at higher offsets than their associated CIEs
      in .debug_frame.

   ld -r using perverse linker scripts can break the first three
   assumptions.  */

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre

[-- Attachment #2: debug.diff.gz --]
[-- Type: application/x-gunzip, Size: 15823 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-10 12:50 linker debug info editing Alan Modra
@ 2006-03-10 19:50 ` James E Wilson
  2006-03-10 20:04   ` Daniel Jacobowitz
  2006-03-11  1:45 ` Jim Blandy
  2006-03-12 18:15 ` Daniel Jacobowitz
  2 siblings, 1 reply; 8+ messages in thread
From: James E Wilson @ 2006-03-10 19:50 UTC (permalink / raw)
  To: Alan Modra; +Cc: binutils, gdb-patches

On Fri, 2006-03-10 at 04:49, Alan Modra wrote:
> This is a first pass at debug info editing to remove bogus entries for
> link-once functions.

There is an equivalent gcc solution we could consider.  Create a
separate compilation unit die for each linkonce function.  We could use
section groups to tie the debug info to the linkonce function, so that
the debug info disappears along with the function.

We already have a similar scheme in use for header files, that mirrors
the BINCL/EINCL stabs support.  This was one of the new features that
went into the DWARF3 standard.  Unfortunately, this code is not the
default yet.  You have to specify -feliminate-dwarf2-dups to get it.  I
think there was some gdb work that needed to be done to complete the
project, and us gcc developers aren't very good at volunteering to do
gdb work.

Using a similar approach for linkonce functions would probably also
require some gdb work.  For one thing, we would have a lot more
compilation unit dies than we had before (worst case one per function
instead of one per file), and gdb might not be able to handle that.

Checking the DWARF3 standard, it specifically mentions elimination
function duplications in Appendix E, in section E.4.2.  Appendix E
describes how to use multiple compilation units and section groups to
eliminate duplication debug info.

If we do need link time editing of dwarf2 debug info, there is a lot of
useful stuff that could be done here, such as eliminating duplicate
debug_abbrev entries.  This is probably more complicated than what you
are attempting here though, but it could perhaps be added later on top
of your work.

You mentioned a bunch of assumptions in the code.  Those assumptions
should be tested against some compilers other than gcc.  Testing the
Intel compiler might not be too hard for instance, especially if you can
get HJ to do it for you.
-- 
Jim Wilson, GNU Tools Support, http://www.specifix.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-10 19:50 ` James E Wilson
@ 2006-03-10 20:04   ` Daniel Jacobowitz
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Jacobowitz @ 2006-03-10 20:04 UTC (permalink / raw)
  To: James E Wilson; +Cc: Alan Modra, binutils, gdb-patches

On Fri, Mar 10, 2006 at 11:50:01AM -0800, James E Wilson wrote:
> On Fri, 2006-03-10 at 04:49, Alan Modra wrote:
> > This is a first pass at debug info editing to remove bogus entries for
> > link-once functions.
> 
> There is an equivalent gcc solution we could consider.  Create a
> separate compilation unit die for each linkonce function.  We could use
> section groups to tie the debug info to the linkonce function, so that
> the debug info disappears along with the function.

You have to finish enabling COMDAT support everywhere, in order to do
this; we (CodeSourcery) tried when it was contributed, but kept
encountering oddball problems on different platforms.

> We already have a similar scheme in use for header files, that mirrors
> the BINCL/EINCL stabs support.  This was one of the new features that
> went into the DWARF3 standard.  Unfortunately, this code is not the
> default yet.  You have to specify -feliminate-dwarf2-dups to get it.  I
> think there was some gdb work that needed to be done to complete the
> project, and us gcc developers aren't very good at volunteering to do
> gdb work.

I made the required changes, a year and a half ago or thereabouts.

> If we do need link time editing of dwarf2 debug info, there is a lot of
> useful stuff that could be done here, such as eliminating duplicate
> debug_abbrev entries.  This is probably more complicated than what you
> are attempting here though, but it could perhaps be added later on top
> of your work.

There's quite a lot of compression that we could do at link time if
we're going to process DIEs at all.  I think this would be a worthwhile
thing to do, and it requires something like Alan's done (although I
haven't looked at the patch yet).

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-10 12:50 linker debug info editing Alan Modra
  2006-03-10 19:50 ` James E Wilson
@ 2006-03-11  1:45 ` Jim Blandy
  2006-03-13  2:44   ` Daniel Berlin
  2006-03-12 18:15 ` Daniel Jacobowitz
  2 siblings, 1 reply; 8+ messages in thread
From: Jim Blandy @ 2006-03-11  1:45 UTC (permalink / raw)
  To: binutils, gdb-patches

After you've chosen dies to delete, how do you deal with other dies
that refer to the deleted dies?  I'm not talking about parents; I'm
talking about attributes whose form is DW_FORM_ref*.

I think the information we need to do this reduction correctly isn't
available at the level you're working at.  linkonce sections aren't
really deleted; they're unified.  The data in them doesn't go away;
equivalent data from elsewhere is used instead.

I tend to think that having the compiler divide the information into
separate compilation units, as Jim suggests, is the only way to go
here.  In that scenario, inter-CU references will use symbols to refer
to their targets; after choosing which instance of the linkonce
section to keep, you should still have definitions for all the symbols
the other dies' relocs refer to.

As Daniel says, the GDB-related reasons for avoiding this solution are
long gone.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-10 12:50 linker debug info editing Alan Modra
  2006-03-10 19:50 ` James E Wilson
  2006-03-11  1:45 ` Jim Blandy
@ 2006-03-12 18:15 ` Daniel Jacobowitz
  2 siblings, 0 replies; 8+ messages in thread
From: Daniel Jacobowitz @ 2006-03-12 18:15 UTC (permalink / raw)
  To: binutils, gdb-patches

On Fri, Mar 10, 2006 at 11:19:21PM +1030, Alan Modra wrote:
> /* This file implements removal of DIEs for discarded link-once sections.
>    Such DIEs are detected by parsing debug sections to find DIE boundaries,
>    then examining relocations for each DIE.  Any DIE containing a field
>    with a relocation against a symbol in a discarded section is marked for
>    removal.  A removed DIE in .debug_info results in all of its children
>    and associated data in .debug_loc and .debug_ranges being removed too.
>    A removed CIE in .debug_frame results in all FDEs using that CIE
>    being removed, but no attempt is made to remove usused CIEs (as can
>    happen when all of a CIE's FDEs are removed).  Likewise, other unused
>    shared info, ie. .debug_abbrev and .debug_str entries, are not removed.
> 
>    We assume that
>    a) debug section relocations are sorted by r_offset,
>    b) .debug_info location lists (references to .debug_loc) occur
>       in increasing order of offset into .debug_loc,
>    c) .debug_info ranges (references to .debug_ranges) occur
>       in increasing order of offset into .debug_ranges,
>    d) entries in .debug_loc and .debug_ranges are not shared,
>    e) FDEs always occur at higher offsets than their associated CIEs
>       in .debug_frame.
> 
>    ld -r using perverse linker scripts can break the first three
>    assumptions.  */

Jim mentioned references to DIEs.  I'm also concerned by deleting just
the DIEs containing discarded relocations and their children; that's
not necessarily a logical place to cut the DIE tree.  In general, this
will work for you, because the relocations you're interested in are
those on DW_AT_high_pc and DW_AT_low_pc; but if the function uses
.debug_ranges instead I guess you won't find it.  And there's probably
some other cases where this is a problem, e.g. if you are cutting
member functions or static variables out of a class type.  Removing
the children can affect the interpretation of the parent.

All these complications are a shame; I think it would be useful if the
linker could edit DWARF data.  But it may be a bit complicated.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-11  1:45 ` Jim Blandy
@ 2006-03-13  2:44   ` Daniel Berlin
  2006-03-13  6:23     ` Alan Modra
  2006-03-13 20:19     ` Daniel Berlin
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Berlin @ 2006-03-13  2:44 UTC (permalink / raw)
  To: Jim Blandy; +Cc: binutils, gdb-patches

On Fri, 2006-03-10 at 17:44 -0800, Jim Blandy wrote:
> After you've chosen dies to delete, how do you deal with other dies
> that refer to the deleted dies?  I'm not talking about parents; I'm
> talking about attributes whose form is DW_FORM_ref*.

The only correct answer to this is "rewrite all the references all
starting from scratch" :P

You could track it, but the gap tracking you'd have to do is pretty
annoying.  I had to do this once for a dwarf2 duplicate die eliminator.

SGI's linker eliminated duplicate dies at link time, IIRC.

> .
> 
> I think the information we need to do this reduction correctly isn't
> available at the level you're working at.  linkonce sections aren't
> really deleted; they're unified.  The data in them doesn't go away;
> equivalent data from elsewhere is used instead.
> 
> I tend to think that having the compiler divide the information into
> separate compilation units, as Jim suggests, is the only way to go
> here.  In that scenario, inter-CU references will use symbols to refer
> to their targets; after choosing which instance of the linkonce
> section to keep, you should still have definitions for all the symbols
> the other dies' relocs refer to.
> 
> As Daniel says, the GDB-related reasons for avoiding this solution are
> long gone.
> 

The problem with the inter-CU references and section splitting scheme
(IE -feliminate-dwarf2-dups) is that it has some greater constant
overhead compared to straight elimination because ref_addr forms are
have larger values, plus the different number of sections.  When you
have 80 meg of debug info, referencing with the absolute offset from the
beginning of .debug_info ends up being 4 bytes, while otherwise it would
have been 1 for an in-cu reference. 

This adds up quite quickly.  For a lot of files, we lost >8-10% of space
savings due to overhead.

In cases where you have < 10 meg of debug info, it sometimes even lost
out to not eliminating duplicates at all (even though there were, in
fact, lots of duplicates).

Also, deciding what to put into the split sections is hard.  You can't
just split every type and program into a separate CU, and ref_addr
everything.  The overhead of doing so is enormous.

I spent a large amount of time when we were implementing
-feliminate-dwarf2-dups measuring the cost of various schemes for
deciding what to try to split and what not.

I came to the conclusion that splitting sections should only really be
used if you can't have something that just goes through and eliminates
all duplicates by understanding and rewriting the dwarf2 info all at
once at link time.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-13  2:44   ` Daniel Berlin
@ 2006-03-13  6:23     ` Alan Modra
  2006-03-13 20:19     ` Daniel Berlin
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Modra @ 2006-03-13  6:23 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Jim Blandy, binutils, gdb-patches

On Sun, Mar 12, 2006 at 09:44:27PM -0500, Daniel Berlin wrote:
> On Fri, 2006-03-10 at 17:44 -0800, Jim Blandy wrote:
> > After you've chosen dies to delete, how do you deal with other dies
> > that refer to the deleted dies?  I'm not talking about parents; I'm
> > talking about attributes whose form is DW_FORM_ref*.
> 
> The only correct answer to this is "rewrite all the references all
> starting from scratch" :P

Yes, that's the conclusion I was coming to..  It's a pain.  All the
offsets can change, and in some cases there can even be an increase.
eg. rewriting a DW_FORM_ref2 as a DW_FORM_ref_addr, to point 
at the proper info (that for the kept function).  I think all the info
is there for ld to do this, but it's quite a lot more work than I
realized.  When I started, I knew more about how to do this in ld than
what actually needed doing.  :)

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linker debug info editing
  2006-03-13  2:44   ` Daniel Berlin
  2006-03-13  6:23     ` Alan Modra
@ 2006-03-13 20:19     ` Daniel Berlin
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Berlin @ 2006-03-13 20:19 UTC (permalink / raw)
  To: Jim Blandy; +Cc: binutils, gdb-patches


> I spent a large amount of time when we were implementing
> -feliminate-dwarf2-dups measuring the cost of various schemes for
> deciding what to try to split and what not.

Oh, i forgot the other half of problems.

If you don't do something like split every die, then you get into things
where you have to name the linkonce sections by the content (using md5
hashes or something), so that you'd don't accidently eliminate one set
of dies that really isn't an exact duplicate of some other.

Once you do this you get hit with ordering issues if the include files
in something else are in a slightly different order, and this causes the
md5's to be different.

*or* you hit the problem that you may not split exactly the same (due to
ordering, more dies, etc), and thus, the dies in one split CU may be
different than another, and thus, will not be commoned (because they
will have different linkonce section names from the content hash)

If you do something obvious like split every type/subprogram die into
it's own section, and linkonce those, the cost in .o files is enormous.

Then, for each type DIE, you have:

12 bytes for a debug_info header
~10-20 bytes of extra overhead from using dw_form_refaddr references
(assuming it is referenced ~3-5 times)
30 bytes for the new section name (.debug_info.linkonce.<16 byte md5>)
10+ bytes (i forget the number) for the extra section table entry you've
added
10+ bytes for the extra symbol you need to reference the thing
everywhere else

It ends up being about 100 bytes per die you put into it's own CU

If you have a C++ app, and 5000 types, you've just wasted 500k.
I had tried this scheme once, and it usually doubled (or more) the size
of the .o files.

IOW, it's always a better idea to eliminate in the linker if possible,
which has *none* of these issues.

It's hard work, but straightforward there if you have all the info you
need.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-03-13 20:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-10 12:50 linker debug info editing Alan Modra
2006-03-10 19:50 ` James E Wilson
2006-03-10 20:04   ` Daniel Jacobowitz
2006-03-11  1:45 ` Jim Blandy
2006-03-13  2:44   ` Daniel Berlin
2006-03-13  6:23     ` Alan Modra
2006-03-13 20:19     ` Daniel Berlin
2006-03-12 18:15 ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).