From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1720 invoked by alias); 3 Dec 2009 01:46:47 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 1709 invoked by uid 22791); 3 Dec 2009 01:46:46 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <20091202193852.GA23631@caradoc.them.org> References: <4A7FE28D.4050901@redhat.com> <4A8D8868.3010302@redhat.com> <20091202051717.GA24978@caradoc.them.org> <20091202173518.GA13838@caradoc.them.org> <20091202193852.GA23631@caradoc.them.org> Date: Thu, 03 Dec 2009 01:46:00 -0000 Message-ID: <8ac60eac0912021746g3cc9b543j1b175cf80b433705@mail.gmail.com> Subject: Re: [RFC] Proposal for a new DWARF name index section From: Paul Pluzhnikov To: Daniel Jacobowitz Cc: Tom Tromey , Cary Coutant , Dodji Seketeli , "GDB/Archer list" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-SW-Source: 2009-q4/txt/msg00076.txt.bz2 On Wed, Dec 2, 2009 at 11:38 AM, Daniel Jacobowitz wrote: > Well, inherent in the cache approach (IMO) is a system-provided cache; > for installed libraries, the cache data could be added to a debuginfo > file. =A0Of course, that assumes GDB's format stays "relatively stable" > across GDB updates. FWIW, I've used the following approach on a previous product X: - As new binary is detected, a copy of X is invoked to parse all the needed debug info into internal form and written to a cache file. - Once the copy exits, the cache file is directly mmap()ed by X. - Cache files older than 1 week, and cache files prepared from binaries which no longer exist in their original location are pruned to keep cache size down. The cache file contains version of X, so when a new version of X is shipped, the cache is automatically rebuilt. It also contains path/timestamp/inode/size for the target binary, so if e.g. one of the shared libs has been rebuilt since last run, only that one shared library must be re-processed. This trades startup speed against disk space, and disk is usually very cheap now. One of our typical usage scenarios is a tiny executable linked with 1000+ C++ shared libraries. Simply re-running the test a second time in a row in GDB takes 1+ minutes, as GDB discards and re-reads the debug info for each solib (it used to take 6+ minutes before my dwarf mmap changes). The major CPU consumers in my tests are now: CPU: AMD64 processors, speed 2200 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 43092 8.2847 read_partial_die 38243 7.3525 strcmp_iw_ordered 36744 7.0643 read_attribute_value 28887 5.5537 cpname_parse 28849 5.5464 d_print_comp 27731 5.3315 htab_hash_string 21975 4.2248 cp_canonicalize_string 20736 3.9866 load_partial_dies 18098 3.4795 cpname_lex 15649 3.0086 lookup_minimal_symbol 15156 2.9138 msymbol_hash_iw 14185 2.7272 htab_find_slot_with_hash I am guessing that a GDB cache of pre-canonicalized strings would save a *lot* of CPU under this scenario, and there is no reason you can't put any other indices into the cache, or to have a stable format of the cache file -- newer version of GDB will simply rebuild what it needs on demand. --=20 Paul Pluzhnikov