* more libunwind startup-overhead tuning
[not found] ` <16357.19587.585585.513769@napali.hpl.hp.com>
@ 2004-01-14 2:58 ` David Mosberger
0 siblings, 0 replies; only message in thread
From: David Mosberger @ 2004-01-14 2:58 UTC (permalink / raw)
To: Jakub Jelinek, Jim Wilson, Richard Henderson, gcc-patches,
Ulrich Drepper, libc-hacker
Cc: davidm
>>>>> On Sat, 20 Dec 2003 23:32:19 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:
David> The dynamic relocation count is now down from 747 to 142 (50
David> of them are NONE relocs).
David> I'm sure there is more tuning that can be done to minimize
David> load-time overhead, but i'll look into those after finishing
David> the DWARF unwinder.
I figured a way to split the local-only unwinder into a separate
library in a way that won't create API/ABI-incompatibilities (except
for rather esoteric corner-cases, which won't affect GCC, GDB, or
other major libunwind-users). With a separate local-only
libunwind.so, the dynamic relocation count shrinks to 72 (32 of which
are NONE relocs).
If I use LD_DEBUG=statistics, I get the following dynamic reloc counts
("final number of relocations"):
no-op program without libunwind: 90
no-op program with libunwind v0.96: 112
no-op program with separate, local-only libunwind: 93
To measure actual execution-time impact, I created a no-op program
"empty" whose main() function returns immediately. Then I created a
statically-linked "forker" program which spawns "empty" 10000 times.
I used LD_PRELOAD to add a dependency on libunwind as desired. The
results are below (numbers are execution time in seconds, as reported
by "time"):
real user system
no-op program without libunwind: 7.347 2.401 4.940
no-op program with libunwind v0.96: 8.253 2.858 5.345
no-op program with separate, local-only libunwind: 7.878 2.627 5.250
So, with the local-only version of libunwind, the pretty much absolute
worst case overhead of always linking dynamically against libunwind
seems to be about 7%. Remember: this is a worst-case which applies
only for shared objects which do not link against anything other than
ld.so and libc.so. In my opinion, this is a reasonably small overhead
(if you really want minimal startup-times for such tiny programs,
static linking will give much better results anyhow).
For completeness, I attached the profile for the "no libunwind" and
the "local-only libunwind" cases below. The caveat for the profiles
is that they cover all 10,000 invocations of "empty" and that the
call-counts where obtained via sampling, so they're not 100% accurate.
Even so, you can see that the call counts are sensible. For example,
in the no-libunwind-case, _dl_relocate_object() gets called about 3
times per "empty" invocation (main program, ld.so, libc, I think) and
about 4 times for the libunwind-case.
I think the only way to essentially eliminate the overhead alltogether
would be to use the analogous scheme to the one used in libpthread.
That is, provide stub-versions of _Unwind_*() which, when invoked,
will dlopen() libunwind.so and re-direct the calls to the appropriate
entry-points in libunwind.so. However, to avoid a dependency against
-ldl (which would defeat the entire purpose of the stubs), libgcc
would have to use __libc_dlopen_mode(), which is probably undesirable.
Comments/feedback welcome.
--david
Profile without libunwind.so:
Each histogram sample counts as 533.125u seconds
% time self cumul calls self/call tot/call name
35.38 7.94 7.94 322k 24.7u 25.7u _dl_relocate_object
16.46 3.69 11.64 66.3M 55.7n 81.6n _dl_make_fptr
10.89 2.45 14.08 8.99M 272n 467n do_lookup_versioned
7.57 1.70 15.78 41.2M 41.2n 41.2n make_fdesc
4.80 1.08 16.86 42.5M 25.4n 25.4n ld-2.3.2.so:strcmp
4.25 0.95 17.81 21.8M 43.8n 43.8n ld-2.3.2.so:__umoddi3
3.20 0.72 18.53 9.89M 72.8n 72.8n ld-2.3.2.so:_dl_elf_hash
2.55 0.57 19.11 105k 5.44u 97.6u ld-2.3.2.so:_dl_start
2.06 0.46 19.57 8.99M 51.5n 592n _dl_lookup_versioned_symbol
1.25 0.28 19.85 870k 322n 322n do_lookup
1.19 0.27 20.12 195k 1.37u 2.16u _dl_map_object_from_fd
0.87 0.20 20.31 103k 1.90u 91.6u dl_main
Profile when pre-loading separate, local-only libunwind.so:
% time self cumul calls self/call tot/call name
32.61 8.21 8.21 445k 18.5u 19.4u _dl_relocate_object
14.91 3.76 11.97 67.6M 55.6n 81.6n _dl_make_fptr
13.01 3.28 15.25 9.39M 349n 596n do_lookup_versioned
6.86 1.73 16.97 42.1M 41.0n 41.0n make_fdesc
5.72 1.44 18.41 32.9M 43.7n 43.7n ld-2.3.2.so:__umoddi3
5.06 1.27 19.69 56.5M 22.5n 22.5n ld-2.3.2.so:strcmp
3.24 0.81 20.50 10.4M 78.7n 78.7n ld-2.3.2.so:_dl_elf_hash
2.30 0.58 21.08 99.5k 5.83u 111u ld-2.3.2.so:_dl_start
2.00 0.50 21.59 9.43M 53.4n 725n _dl_lookup_versioned_symbol
1.43 0.36 21.95 296k 1.22u 2.05u _dl_map_object_from_fd
1.41 0.35 22.30 879k 404n 404n do_lookup
0.87 0.22 22.52 88.0k 2.47u 116u dl_main
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2004-01-14 2:58 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20031218150528.GB12344@sunsite.ms.mff.cuni.cz>
[not found] ` <16353.61411.949110.950531@napali.hpl.hp.com>
[not found] ` <20031218202444.GC24876@devserv.devel.redhat.com>
[not found] ` <16357.19587.585585.513769@napali.hpl.hp.com>
2004-01-14 2:58 ` more libunwind startup-overhead tuning David Mosberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).