Re: [RFC] Proposal for a new DWARF name index section

public inbox for archer@sourceware.org
 help / color / mirror / Atom feed

From: Paul Pluzhnikov <ppluzhnikov@google.com>
To: Daniel Jacobowitz <drow@false.org>
Cc: Tom Tromey <tromey@redhat.com>,
	Cary Coutant <ccoutant@google.com>,
	Dodji Seketeli <dodji@redhat.com>,
	"GDB/Archer list" <archer@sourceware.org>
Subject: Re: [RFC] Proposal for a new DWARF name index section
Date: Thu, 03 Dec 2009 01:46:00 -0000	[thread overview]
Message-ID: <8ac60eac0912021746g3cc9b543j1b175cf80b433705@mail.gmail.com> (raw)
In-Reply-To: <20091202193852.GA23631@caradoc.them.org>

On Wed, Dec 2, 2009 at 11:38 AM, Daniel Jacobowitz <drow@false.org> wrote:

> Well, inherent in the cache approach (IMO) is a system-provided cache;
> for installed libraries, the cache data could be added to a debuginfo
> file.  Of course, that assumes GDB's format stays "relatively stable"
> across GDB updates.

FWIW, I've used the following approach on a previous product X:

- As new binary is detected, a copy of X is invoked to parse all
  the needed debug info into internal form and written to a cache file.
- Once the copy exits, the cache file is directly mmap()ed by X.
- Cache files older than 1 week, and cache files prepared from
  binaries which no longer exist in their original location are
  pruned to keep cache size down.

The cache file contains version of X, so when a new version of X
is shipped, the cache is automatically rebuilt.

It also contains path/timestamp/inode/size for the target binary,
so if e.g. one of the shared libs has been rebuilt since last run,
only that one shared library must be re-processed.

This trades startup speed against disk space, and disk is usually
very cheap now.

One of our typical usage scenarios is a tiny executable linked with
1000+ C++ shared libraries. Simply re-running the test a second time
in a row in GDB takes 1+ minutes, as GDB discards and re-reads the
debug info for each solib (it used to take 6+ minutes before my dwarf
mmap changes).

The major CPU consumers in my tests are now:

CPU: AMD64 processors, speed 2200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
43092     8.2847  read_partial_die
38243     7.3525  strcmp_iw_ordered
36744     7.0643  read_attribute_value
28887     5.5537  cpname_parse
28849     5.5464  d_print_comp
27731     5.3315  htab_hash_string
21975     4.2248  cp_canonicalize_string
20736     3.9866  load_partial_dies
18098     3.4795  cpname_lex
15649     3.0086  lookup_minimal_symbol
15156     2.9138  msymbol_hash_iw
14185     2.7272  htab_find_slot_with_hash

I am guessing that a GDB cache of pre-canonicalized strings would
save a *lot* of CPU under this scenario, and there is no reason
you can't put any other indices into the cache, or to have a stable
format of the cache file -- newer version of GDB will simply rebuild
what it needs on demand.

-- 
Paul Pluzhnikov

next prev parent reply	other threads:[~2009-12-03  1:46 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-10  9:04 Dodji Seketeli
2009-08-10 14:38 ` Jan Kratochvil
2009-08-10 17:36   ` Tom Tromey
2009-08-10 18:21     ` Jan Kratochvil
2009-08-11  7:55       ` Dodji Seketeli
2009-08-11 17:45         ` Jan Kratochvil
2009-08-11 22:43           ` Tom Tromey
2009-08-12 19:20             ` Jan Kratochvil
2009-08-11 22:29       ` Tom Tromey
2009-08-20 17:31 ` Dodji Seketeli
2009-11-17 23:46   ` Cary Coutant
2009-11-20 17:25     ` Tom Tromey
2009-11-22  4:39       ` Daniel Jacobowitz
2009-11-23 19:51         ` Tom Tromey
2009-12-01 19:14       ` Tom Tromey
2009-12-02  5:17         ` Daniel Jacobowitz
2009-12-02 17:07           ` Tom Tromey
2009-12-02 17:35             ` Daniel Jacobowitz
2009-12-02 19:23               ` Tom Tromey
2009-12-02 19:39                 ` Daniel Jacobowitz
2009-12-03  1:46                   ` Paul Pluzhnikov [this message]
2009-12-04 23:13                     ` Tom Tromey
2009-12-06  3:41                       ` Tom Tromey
2009-12-07 21:32                         ` Tom Tromey
2009-12-02 16:11         ` Dodji Seketeli
2009-12-02 17:29           ` Tom Tromey
2009-12-11 23:56     ` Tom Tromey
2009-12-12  0:06       ` Daniel Jacobowitz
2009-12-12  0:13       ` Cary Coutant
2009-12-13  3:48       ` Dodji Seketeli
2009-12-14 15:32       ` Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ac60eac0912021746g3cc9b543j1b175cf80b433705@mail.gmail.com \
    --to=ppluzhnikov@google.com \
    --cc=archer@sourceware.org \
    --cc=ccoutant@google.com \
    --cc=dodji@redhat.com \
    --cc=drow@false.org \
    --cc=tromey@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).