public inbox for archer@sourceware.org
 help / color / mirror / Atom feed
* Fedora 14 debug proposal
@ 2010-06-08 20:38 Tom Tromey
  2010-06-09 13:54 ` Dodji Seketeli
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Tom Tromey @ 2010-06-08 20:38 UTC (permalink / raw)
  To: Project Archer; +Cc: Jakub Jelinek

Hi all.

This week I want to file a Fedora 14 feature proposal touching on
debuginfo.  I thought we could discuss it here first.

The proposal has 2 or maybe 3 parts.

1. Generate index files for the separate .debug files.  This involves
   running gdb to dump the index, something like:

   cd $dir
   gdb --batch-silent -ex 'maintenance save-gnu-index .' whatever.debug

   The generated index files are architecture-neutral, if it matters.

   The point of this change is that it makes gdb startup much, much
   faster.  The indices can be mapped directly and make both by-name and
   by-address lookups very fast.

   I think this should help things like ABRT, plus of course anybody who
   wants to debug into any library we ship.

   I don't know where the debuginfo stripping is done right now, but
   that seems like the best place to run this script.

   These gdb changes aren't upstream yet.  I wanted to get Fedora buy-in
   before dealing with that.

2. Change GCC so that it no longer emits .debug_aranges,
   .debug_pubnames, and .debug_pubtypes.

   From what I can tell, no program uses these sections.  They just
   waste space.

   Well... Fedora gdb does use .debug_aranges, but that use is replaced
   by the index.  .debug_aranges is a reasonable-enough section; it is
   just that we really also need by-name indices to get good
   performance, and having the whole index be mmap()able gives better
   startup performance.

   I think .debug_pub* are pretty useless.  GCC didn't even generate
   pubtypes for years, and it had a lot of pubnames bugs... maybe it
   still does.  What this means is that we can't really make gdb rely on
   them.  Also, based on earlier experiments, reading these sections is
   actually still too slow.  The index is better.

   I can write the gcc patch for this.

3. If we are shipping GCC 4.5 in F14, I think we should enable the
   .debug_types stuff by default.  This will shrink debuginfo and it
   makes gdb use less memory.

   This one is optional, in particular I assume it will be subsumed by
   the other DWARF compression work.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-08 20:38 Fedora 14 debug proposal Tom Tromey
@ 2010-06-09 13:54 ` Dodji Seketeli
  2010-06-09 16:11   ` Tom Tromey
  2010-06-11 20:33 ` Tom Tromey
  2010-06-13 10:40 ` Roland McGrath
  2 siblings, 1 reply; 11+ messages in thread
From: Dodji Seketeli @ 2010-06-09 13:54 UTC (permalink / raw)
  To: tromey; +Cc: Project Archer, Jakub Jelinek

Tom Tromey <tromey@redhat.com> writes:

[...]

> 2. Change GCC so that it no longer emits .debug_aranges,
>    .debug_pubnames, and .debug_pubtypes.
>
>    From what I can tell, no program uses these sections.  They just
>    waste space.
>
>    Well... Fedora gdb does use .debug_aranges, but that use is replaced
>    by the index.  .debug_aranges is a reasonable-enough section; it is
>    just that we really also need by-name indices to get good
>    performance, and having the whole index be mmap()able gives better
>    startup performance.
>
>    I think .debug_pub* are pretty useless.  GCC didn't even generate
>    pubtypes for years, and it had a lot of pubnames bugs... maybe it
>    still does.  What this means is that we can't really make gdb rely on
>    them.  Also, based on earlier experiments, reading these sections is
>    actually still too slow.  The index is better.

I can take this part if it helps. I guess at worst, upstream will
require a flag to get the .debug_{pug,aranges,pubnames,pubtypes} section
back for a little while before removing the code completely?

-- 
	Dodji

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-09 13:54 ` Dodji Seketeli
@ 2010-06-09 16:11   ` Tom Tromey
  2010-06-11 20:31     ` Tom Tromey
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Tromey @ 2010-06-09 16:11 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: Project Archer, Jakub Jelinek

>>>>> "Dodji" == Dodji Seketeli <dodji@redhat.com> writes:

Dodji> I can take this part if it helps.

Thanks... I don't expect this to be a big deal, but I may call on you
anyway ;-)

Dodji> I guess at worst, upstream will require a flag to get the
Dodji> .debug_{pug,aranges,pubnames,pubtypes} section back for a little
Dodji> while before removing the code completely?

I wouldn't bother with an option, but then I'm not very concerned about
whether this particular change goes upstream.  Of course, that is easy
for me to say -- I will do whatever Jakub thinks is best here.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-09 16:11   ` Tom Tromey
@ 2010-06-11 20:31     ` Tom Tromey
  2010-06-14 10:17       ` Jakub Jelinek
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Tromey @ 2010-06-11 20:31 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: Project Archer, Jakub Jelinek

>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:

Dodji> I guess at worst, upstream will require a flag to get the
Dodji> .debug_{pug,aranges,pubnames,pubtypes} section back for a little
Dodji> while before removing the code completely?

Tom> I wouldn't bother with an option, but then I'm not very concerned about
Tom> whether this particular change goes upstream.  Of course, that is easy
Tom> for me to say -- I will do whatever Jakub thinks is best here.

On irc, Jakub asked for this to be sent upstream, just to see if they
would accept it.  I'm struggling through a regtest, you'll see it on
gcc-patches "soon".

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-08 20:38 Fedora 14 debug proposal Tom Tromey
  2010-06-09 13:54 ` Dodji Seketeli
@ 2010-06-11 20:33 ` Tom Tromey
  2010-06-13 10:40 ` Roland McGrath
  2 siblings, 0 replies; 11+ messages in thread
From: Tom Tromey @ 2010-06-11 20:33 UTC (permalink / raw)
  To: Project Archer; +Cc: Jakub Jelinek

>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:

Tom> This week I want to file a Fedora 14 feature proposal touching on
Tom> debuginfo.

The page, in case you want to watch it:

https://fedoraproject.org/wiki/Features/GdbIndex

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-14 10:17       ` Jakub Jelinek
@ 2010-06-11 20:39         ` Tom Tromey
  0 siblings, 0 replies; 11+ messages in thread
From: Tom Tromey @ 2010-06-11 20:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Dodji Seketeli, Project Archer

Jakub> I guess everybody is struggling with bootstraps/regtests lately,
Jakub> apparently every second patch or so from Honza lately causes a
Jakub> bootstrap failure or some regression, really wonder what kind of
Jakub> testing he does on his patches if any.

In my case it is just that I'm out of practice -- I let my compile farm
regtester rot, now it can't build libjava due to some broken library
dependency, etc.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-08 20:38 Fedora 14 debug proposal Tom Tromey
  2010-06-09 13:54 ` Dodji Seketeli
  2010-06-11 20:33 ` Tom Tromey
@ 2010-06-13 10:40 ` Roland McGrath
  2010-06-14 20:06   ` Tom Tromey
  2010-06-15 19:46   ` Tom Tromey
  2 siblings, 2 replies; 11+ messages in thread
From: Roland McGrath @ 2010-06-13 10:40 UTC (permalink / raw)
  To: tromey; +Cc: Project Archer, Jakub Jelinek

> 1. Generate index files for the separate .debug files.  This involves
>    running gdb to dump the index, something like:

Is this a magical gdb-internal format, or something that will really be
specified as a known file format?  Since the plan as stated is to
distribute this format in distro packages that will be around forever, the
format details become interesting from a long-term compatibility point of
view, not just as a gdb feature of today.

>    I don't know where the debuginfo stripping is done right now, but
>    that seems like the best place to run this script.

See /usr/lib/rpm/find-debuginfo.sh.  If DWARF compression ever gets
finished, its tools will probably replace most or all of that script.

> 2. Change GCC so that it no longer emits .debug_aranges,
>    .debug_pubnames, and .debug_pubtypes.

Please do not lump .debug_aranges in with .debug_pub*.  They are
qualitatively different cases.  .debug_aranges is a direct low-level
derivative of the CU DIEs.  The others are made with language-specific
high-level knowledge.

>    From what I can tell, no program uses these sections.  They just
>    waste space.

libdw uses .debug_aranges and does not fall back to linear search of CUs.
Removing it breaks all existing users that do any kind of lookups by PC
address.

>    I think .debug_pub* are pretty useless.  [...]

I don't doubt that.

>    I can write the gcc patch for this.

For Fedora purposes, dropping .debug_pub* sections could just as well be
done in the stripping stage.  And, I don't think they really cost much
space in the grand scheme of things.  So there is little real motivation to
fiddle gcc at all until after we have completed basically everything else
in the related realms.  (There also isn't really any reason I know of not
to drop .debug_pub* from gcc yesterday if anyone really wants to.)

> 3. If we are shipping GCC 4.5 in F14, I think we should enable the
>    .debug_types stuff by default.  This will shrink debuginfo and it
>    makes gdb use less memory.

libdw does not yet handle .debug_types.  It of course will, but I wouldn't
like to have a gcc defaults change on any queue until we are quite concrete
with getting all the support in line.

>    This one is optional, in particular I assume it will be subsumed by
>    the other DWARF compression work.

It should be, yes.  I don't see any reason that .debug_types and
DW_FORM_ref_sig8 need to survive final linking.  The normal reference
forms are more efficient for consumers to use.  Replacing ref_sig8 forms
with direct ref_addr forms requires that the targets be in .debug_info
rather than .debug_types.  So I'd been imagining that DW_TAG_type_unit
would morph into DW_TAG_compile_unit anyway.  It's still possible that
emitting .debug_types during compilation could speed up the build
process, if plain ld COMDAT handling reduces a lot of duplication before
the brute-force DWARF duplicate-subtree finder has to run.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-11 20:31     ` Tom Tromey
@ 2010-06-14 10:17       ` Jakub Jelinek
  2010-06-11 20:39         ` Tom Tromey
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2010-06-14 10:17 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Dodji Seketeli, Project Archer

On Fri, Jun 11, 2010 at 02:31:44PM -0600, Tom Tromey wrote:
> >>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:
> 
> Dodji> I guess at worst, upstream will require a flag to get the
> Dodji> .debug_{pug,aranges,pubnames,pubtypes} section back for a little
> Dodji> while before removing the code completely?
> 
> Tom> I wouldn't bother with an option, but then I'm not very concerned about
> Tom> whether this particular change goes upstream.  Of course, that is easy
> Tom> for me to say -- I will do whatever Jakub thinks is best here.
> 
> On irc, Jakub asked for this to be sent upstream, just to see if they
> would accept it.  I'm struggling through a regtest, you'll see it on
> gcc-patches "soon".

I guess everybody is struggling with bootstraps/regtests lately,
apparently every second patch or so from Honza lately causes a bootstrap failure
or some regression, really wonder what kind of testing he does on his
patches if any.

	Jakub

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-13 10:40 ` Roland McGrath
@ 2010-06-14 20:06   ` Tom Tromey
  2010-06-15 19:46   ` Tom Tromey
  1 sibling, 0 replies; 11+ messages in thread
From: Tom Tromey @ 2010-06-14 20:06 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Project Archer, Jakub Jelinek

Tom> 1. Generate index files for the separate .debug files.  This
Tom> involves running gdb to dump the index, something like:

Roland> Is this a magical gdb-internal format, or something that will
Roland> really be specified as a known file format?  Since the plan as
Roland> stated is to distribute this format in distro packages that will
Roland> be around forever, the format details become interesting from a
Roland> long-term compatibility point of view, not just as a gdb feature
Roland> of today.

Magical gdb format.

I've appended the comment that documents it.

Tom> 2. Change GCC so that it no longer emits .debug_aranges,
Tom> .debug_pubnames, and .debug_pubtypes.

Roland> Please do not lump .debug_aranges in with .debug_pub*.  They are
Roland> qualitatively different cases.  .debug_aranges is a direct low-level
Roland> derivative of the CU DIEs.  The others are made with language-specific
Roland> high-level knowledge.

Ok.

Tom

/* The mapped index file format is designed to be directly mmap()able
   on any architecture.  In most cases, a datum is represented using a
   little-endian 32-bit integer value, called an offset_type.  Big
   endian machines must byte-swap the values before using them.
   Exceptions to this rule are noted.  The data is laid out such that
   alignment is always respected.

   A mapped index consists of several sections.
   
   1. The file header.  This is a sequence of values, of offset_type
   unless otherwise noted:
   [0] The version number.  Currently 1.
   [1] The mtime of the objfile, as a 64-bit little-endian value.
   [2] The size of the objfile, as a 64-bit little-endian value.
   The size and mtime are used to confirm that the index matches the
   objfile.
   [3] The offset, from the start of the file, of the CU list.
   [4] The offset, from the start of the file, of the address section.
   [5] The offset, from the start of the file, of the symbol table.
   [6] The offset, from the start of the file, of the constant pool.

   2. The CU list.  This is a sequence of pairs of offset_type values.
   The first element in each pair is the offset of a CU in the
   .debug_info section.  The second element in each pair is the length
   of that CU.  References to a CU elsewhere in the map are done using
   a CU index, which is just the 0-based index into this table.
   
   3. The address section.  The address section consists of a sequence
   of address entries.  Each address entry has three elements.
   [0] The low address.  This is a 64-bit little-endian value.
   [1] The high address (plus one).  This is also a 64-bit
   little-endian value.
   [2] The CU index.  This is an offset_type value.
   
   4. The symbol table.  This is a hash table.  The size of the hash
   table is always a power of 2.  The initial hash and the step are
   currently defined by the `find_slot' function.
   
   Each slot in the hash table consists of a pair of offset_type
   values.  The first value is the offset of the symbol's name in the
   constant pool.  The second value is the offset of the CU vector in
   the constant pool.
   
   If both values are 0, then this slot in the hash table is empty.
   This is ok because while 0 is a valid constant pool index, it
   cannot be a valid index for both a string and a CU vector.
   
   A string in the constant pool is stored as a \0-terminated string,
   as you'd expect.
   
   A CU vector in the constant pool is a sequence of offset_type
   values.  The first value is the number of CU indices in the vector.
   Each subsequent value is the index of a CU in the CU list.  This
   element in the hash table is used to indicate which CUs define the
   symbol.
   
   5. The constant pool.  This is simply a bunch of bytes.  It is
   organized so that alignment is correct: CU vectors are stored
   first, followed by strings.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-13 10:40 ` Roland McGrath
  2010-06-14 20:06   ` Tom Tromey
@ 2010-06-15 19:46   ` Tom Tromey
  2010-06-15 22:26     ` Roland McGrath
  1 sibling, 1 reply; 11+ messages in thread
From: Tom Tromey @ 2010-06-15 19:46 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Project Archer, Jakub Jelinek

>>>>> "Roland" == Roland McGrath <roland@redhat.com> writes:

Roland> For Fedora purposes, dropping .debug_pub* sections could just as
Roland> well be done in the stripping stage.  And, I don't think they
Roland> really cost much space in the grand scheme of things.  So there
Roland> is little real motivation to fiddle gcc at all until after we
Roland> have completed basically everything else in the related realms.

Doing it in gcc can save some space, by not putting useless stuff into
the string table.  I didn't try to measure this.  Just stripping the
sections does not save much space, as you hypothesized; I did a test
with objcopy on all the debuginfo I have currently installed (OO.o plus
other random bits):

opsy. du -s debug stripped
1883432	debug
1812156	stripped
  71276 savings

The indices are about 3x the size of the savings:

opsy. du -s Out
208876	Out

Tom> 3. If we are shipping GCC 4.5 in F14, I think we should enable the
Tom> .debug_types stuff by default.  This will shrink debuginfo and it
Tom> makes gdb use less memory.

Roland> libdw does not yet handle .debug_types.  It of course will, but
Roland> I wouldn't like to have a gcc defaults change on any queue until
Roland> we are quite concrete with getting all the support in line.

Yeah, that would be good for us too; some of the recent gcc changes
caught us by surprise.  OTOH I don't want to stop Jakub's gusto, maybe
just get a little more warning of coming gdb-impacting changes.

Tom> This one is optional, in particular I assume it will be subsumed by
Tom> the other DWARF compression work.

Roland> It should be, yes.  I don't see any reason that .debug_types and
Roland> DW_FORM_ref_sig8 need to survive final linking.  The normal reference
Roland> forms are more efficient for consumers to use.

Why is that?  I looked at the gdb code here and nothing really stood out.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fedora 14 debug proposal
  2010-06-15 19:46   ` Tom Tromey
@ 2010-06-15 22:26     ` Roland McGrath
  0 siblings, 0 replies; 11+ messages in thread
From: Roland McGrath @ 2010-06-15 22:26 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Project Archer, Jakub Jelinek

> Roland> It should be, yes.  I don't see any reason that .debug_types and
> Roland> DW_FORM_ref_sig8 need to survive final linking.  The normal reference
> Roland> forms are more efficient for consumers to use.
> 
> Why is that?  I looked at the gdb code here and nothing really stood out.

ref_sig8 is a key to match in searching through type units.  (Presumably a
hash table lookup among already-interned units, interning more linearly as
needed.)  The ref forms are direct pointers into the file.  In the case of
ref_addr (the case for any actual sharing/compression), a consumer needs to
figure out which CU it's in and intern that CU (i.e. track at least its
header details, the total of "interning" that libdw does), which is a
similar search and on-demand interning (in libdw this one is a tree-based
search to match the file-offset bounds of the CU).  For a consumer like GDB
that interns at the DIE level, it's presumably a similar lookup (hash table
or btree or whatever) keyed on the file offset to match a DIE previously
interned.  So it is simpler in theory but perhaps a wash in practice.

What might be more important is the space savings.  ref_sig8 itself uses
twice the space of ref_addr.  But beyond that, each referent must get its
own type unit, with space for the unit header, plus duplicates of the
containing DIE structure (levels of namespace, class, etc.).  In contrast,
optimal direct compression needs only as many unit headers (for the
partial_unit or compile_unit) as there are distinct sets of sharing
references.  A shared partial_unit contains many referent DIEs nested in
the single copy of the containing DIE structure, since references to
foo::bar::baz::type1 and foo::bar::baz::type2::innertype3, etc., are all
just direct pointers into different subtrees of the same larger tree.

Anyway, the proof will be in the putative pudding.  When we have
compression working and libdw capable of handling ref_sig8, then it will be
fairly straightforward to try preserving type units and ref_sig8's as they
are (along with partial_unit-based compression of everything else) and
compare that to morphing everything into direct references and compressing
that way.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-06-15 22:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-08 20:38 Fedora 14 debug proposal Tom Tromey
2010-06-09 13:54 ` Dodji Seketeli
2010-06-09 16:11   ` Tom Tromey
2010-06-11 20:31     ` Tom Tromey
2010-06-14 10:17       ` Jakub Jelinek
2010-06-11 20:39         ` Tom Tromey
2010-06-11 20:33 ` Tom Tromey
2010-06-13 10:40 ` Roland McGrath
2010-06-14 20:06   ` Tom Tromey
2010-06-15 19:46   ` Tom Tromey
2010-06-15 22:26     ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).