Re: Inter-CU DWARF size optimizations and gcc -flto

public inbox for archer@sourceware.org
 help / color / mirror / Atom feed

From: Tom Tromey <tromey@redhat.com>
To: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: archer@sourceware.org, Jakub Jelinek <jakub@redhat.com>
Subject: Re: Inter-CU DWARF size optimizations and gcc -flto
Date: Wed, 22 Feb 2012 21:56:00 -0000	[thread overview]
Message-ID: <87hayio7ld.fsf@fleche.redhat.com> (raw)
In-Reply-To: <20120201132307.GA32578@host2.jankratochvil.net> (Jan Kratochvil's message of "Wed, 1 Feb 2012 14:23:09 +0100")

Jan> (b) .gdb_index will have limited scope, only to select which
Jan> objfiles to expand, no longer to select which CUs to expand.

I suspect we are going to need a better approach here anyway.
I sometimes hear about programs with more than 800 shared libraries.
If you assume separate debuginfo this means 1600 objfiles.
I think this will just crush most of the existing algorithms in gdb.

Jan> (c) Partial CU expansion Tom Tromey talks about is a must in such case.

I realized I never wrote up how this could work.  The below is sort of a
sketch that devolves into random thoughts.

I have been thinking about it since we discussed it and I think it has a
potentially severe problem.

The basic idea is simple: right now we have two DWARF readers in
dwarf2read.c, the psymtab reader and the full symbol reader.

Right now when we find a psymbol, we expand the whole CU to full
symbols.  This normally isn't too bad -- but there are some CUs out
there in practice that are quite large, and the delay reading them is
noticeable.

So, what if we unified the two readers -- eliminating one source of bugs
-- and also changed CU expansion to be DIE-based.  That is, in symtab.c,
before returning a symbol from a symtab, we would call some back-end
function to expand the symbol.  The DWARF reader would then just read
the DIEs needed to instantiate that one particular symbol plus whatever
dependencies (types usually) it has.

Ok, that sounds good, but there is a problem: struct symbol is really
big, much bigger than a psymbol.  We could just read psymbol-like
structs on our first pass, but we need somewhere to store the DIE offset
for efficient expansion.

We can solve that by updating and applying an old patch that shrinks
psymbol.  Then we can use the saved space to store the DIE -- so this
change can be space-neutral.

However, this neglects the bcache.  In fact, the bcache sinks the whole
project, since DIE offsets will vary by definition.

Well, the DIE offset sinks this particular approach.  Maybe there is
another approach, not space-neutral but also not too bad, that can be
used.  For example, keeping the bcache but having the symtabs contain
both {psymbol+DIE} pairs and fully-expanded symbols (depending on what
has been expanded).

If we went a bit deeper and had hierarchical symbol tables, we could
skip whole DIE subtrees even in the partial reader.

A related idea here that I was idly wondering about is whether we could
make the psymtab reader hierarchical without touching full symbols.

The deeper rewrite seems eventually necessary.  The symbol table code is
pretty horrible, in multiple ways.  However, at least for me it hasn't
yet reached the pain point where we can justify spending months and
months on it, which I think is what it would take.

Your thoughts welcome.

Tom

next prev parent reply	other threads:[~2012-02-22 21:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 13:23 Jan Kratochvil
2012-02-01 13:32 ` Jakub Jelinek
2012-02-22 21:56 ` Tom Tromey [this message]
2012-02-26 15:09   ` Daniel Jacobowitz
2012-03-03  2:54     ` Tom Tromey
2012-03-05  0:25       ` Daniel Jacobowitz
2012-03-05 22:03         ` Tom Tromey
2012-03-15 12:51         ` Gary Benson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87hayio7ld.fsf@fleche.redhat.com \
    --to=tromey@redhat.com \
    --cc=archer@sourceware.org \
    --cc=jakub@redhat.com \
    --cc=jan.kratochvil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).