* gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling [not found] ` <1318929963.8669.2.camel@springer.wildebeest.org> @ 2011-10-18 9:45 ` Jan Kratochvil 2011-10-18 9:49 ` Jan Kratochvil 2011-10-19 9:34 ` Mark Wielaard 0 siblings, 2 replies; 5+ messages in thread From: Jan Kratochvil @ 2011-10-18 9:45 UTC (permalink / raw) To: Mark Wielaard; +Cc: Project Archer Hi Mark, <warning> moved to a public list </warning> On Tue, 18 Oct 2011 11:26:03 +0200, Mark Wielaard wrote: > On Mon, 2011-10-17 at 15:36 +0200, Jan Kratochvil wrote: > > gcc.post: Drop DW_AT_sibling; remove 27 LoC: -3.49% .debug size, -1.7% > > GDB time. > > Do you have more information about that? Systemtap for example, which > uses elfutils libdw uses DW_AT_subling to more efficiently go through > the debug_info DIEs. The patch with various benchmarks is: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00992.html GDB also uses DW_AT_sibling when available (skip_one_die and locate_pdi_sibling). The mail above quotation: # I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock # multipliers. Nowadays mostly only the data size transferred over FSB matters. The problem is the DIEs skipping by CPU is so cheap on current CPUs it cannot be compared with the overhead of providing the helper data for it. I did not expect dropping DW_AT_sibling would be even a consumer performance _improvement_. I expected more it will be either not measurable or just not significant enough for the .debug on-disk sizes cost justification. I did only gdb and idb benchmarks. systemtap benchmark is welcome, libstdc++ files for benchmark, if it is enough for systemtap this way: http://people.redhat.com/jkratoch/ns.tar.xz Thanks, Jan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling 2011-10-18 9:45 ` gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling Jan Kratochvil @ 2011-10-18 9:49 ` Jan Kratochvil 2011-10-19 9:34 ` Mark Wielaard 1 sibling, 0 replies; 5+ messages in thread From: Jan Kratochvil @ 2011-10-18 9:49 UTC (permalink / raw) To: Mark Wielaard; +Cc: Project Archer On Tue, 18 Oct 2011 11:44:57 +0200, Jan Kratochvil wrote: > The problem is the DIEs skipping by CPU is so cheap on current CPUs it cannot > be compared with the overhead of providing the helper data for it. I did not > expect dropping DW_AT_sibling would be even a consumer performance > _improvement_. I expected more it will be either not measurable or just not > significant enough for the .debug on-disk sizes cost justification. maybe it could be worth tune out specific special cases where DW_AT_sibling skips larger set of DIEs and any of the consumers benefits from that case. Just at least in the case of GDB there are so many several orders of magnitude worse performance issues than reading out the CU data that I do not think it matters much and the on-disk size should be the primary concern even even if would mean some performance degradation, which will not be much measurable. It is true systemtap is a different kind of consumer, thanks for pointing it out. Thanks, Jan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling 2011-10-18 9:45 ` gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling Jan Kratochvil 2011-10-18 9:49 ` Jan Kratochvil @ 2011-10-19 9:34 ` Mark Wielaard 2011-10-19 11:29 ` Jan Kratochvil 2011-10-19 15:38 ` Jan Kratochvil 1 sibling, 2 replies; 5+ messages in thread From: Mark Wielaard @ 2011-10-19 9:34 UTC (permalink / raw) To: Jan Kratochvil; +Cc: Project Archer Hi Jan, On Tue, 2011-10-18 at 11:44 +0200, Jan Kratochvil wrote: > On Tue, 18 Oct 2011 11:26:03 +0200, Mark Wielaard wrote: > > On Mon, 2011-10-17 at 15:36 +0200, Jan Kratochvil wrote: > > > gcc.post: Drop DW_AT_sibling; remove 27 LoC: -3.49% .debug size, -1.7% > > > GDB time. > > > > Do you have more information about that? Systemtap for example, which > > uses elfutils libdw uses DW_AT_subling to more efficiently go through > > the debug_info DIEs. > > The patch with various benchmarks is: > [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00992.html > > GDB also uses DW_AT_sibling when available (skip_one_die and > locate_pdi_sibling). The mail above quotation: > # I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock > # multipliers. Nowadays mostly only the data size transferred over FSB matters. > > The problem is the DIEs skipping by CPU is so cheap on current CPUs it cannot > be compared with the overhead of providing the helper data for it. I did not > expect dropping DW_AT_sibling would be even a consumer performance > _improvement_. I expected more it will be either not measurable or just not > significant enough for the .debug on-disk sizes cost justification. > > I did only gdb and idb benchmarks. systemtap benchmark is welcome, libstdc++ > files for benchmark, if it is enough for systemtap this way: > http://people.redhat.com/jkratoch/ns.tar.xz Thanks for those. Some quick benchmarks show systemtap selection of functions and function parameters is slightly slower without DW_AT_sibling being available. But not dramatically. $ for i in $(find ns -name libstdc++.so.6.0.17.debug); do echo $i; time stap -l "process(\"$i\").function(\"*\")" | wc --lines; done ns/gccgit-c-xxxxxxxxxxxxx-test/default/libstdc++.so.6.0.17.debug 1852 real 0m0.427s user 0m0.392s sys 0m0.036s ns/gccgit-c-xxxxxxxxxxxxx-test/gdbindex/libstdc++.so.6.0.17.debug 1852 real 0m0.422s user 0m0.384s sys 0m0.035s ns/gccgit-c-ns-xxxxxxxxxx-test/default/libstdc++.so.6.0.17.debug 1852 real 0m0.447s user 0m0.406s sys 0m0.042s ns/gccgit-c-ns-xxxxxxxxxx-test/gdbindex/libstdc++.so.6.0.17.debug 1852 real 0m0.443s user 0m0.404s sys 0m0.037s That is selecting all functions in libstdc++. Systemtap doesn't use gdbindex, but I included it so you can see the "noise". Here the slowdown seems somewhat equal to the noise. If we also want parameters/variables listed for each function probe point (using -L) things are a bit more visible: $ for i in $(find ns -name libstdc++.so.6.0.17.debug); do echo $i; time stap -L "process(\"$i\").function(\"*\")" | wc --lines; done ns/gccgit-c-xxxxxxxxxxxxx-test/default/libstdc++.so.6.0.17.debug 1852 real 0m0.573s user 0m0.522s sys 0m0.043s ns/gccgit-c-xxxxxxxxxxxxx-test/gdbindex/libstdc++.so.6.0.17.debug 1852 real 0m0.558s user 0m0.505s sys 0m0.052s ns/gccgit-c-ns-xxxxxxxxxx-test/default/libstdc++.so.6.0.17.debug 1852 real 0m0.611s user 0m0.556s sys 0m0.056s ns/gccgit-c-ns-xxxxxxxxxx-test/gdbindex/libstdc++.so.6.0.17.debug 1852 real 0m0.603s user 0m0.557s sys 0m0.045s Still no dramatic slowdown, but probably enough to discount random noise in the measurements. So DW_AT_sibling definitely does help systemtap/libdw walk a little bit more efficient through the DIE tree. But you are right that walking the DIE tree even without them can be done pretty quickly. Cheers, Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling 2011-10-19 9:34 ` Mark Wielaard @ 2011-10-19 11:29 ` Jan Kratochvil 2011-10-19 15:38 ` Jan Kratochvil 1 sibling, 0 replies; 5+ messages in thread From: Jan Kratochvil @ 2011-10-19 11:29 UTC (permalink / raw) To: Mark Wielaard; +Cc: Project Archer Hi Mark, On Wed, 19 Oct 2011 11:34:18 +0200, Mark Wielaard wrote: > real 0m0.558s -> > real 0m0.603s I find it a significant enough performance degradation. I will look into some compromise of a selective DW_AT_sibling entries. Thanks for the stap commands for performance tuning. Regards, Jan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling 2011-10-19 9:34 ` Mark Wielaard 2011-10-19 11:29 ` Jan Kratochvil @ 2011-10-19 15:38 ` Jan Kratochvil 1 sibling, 0 replies; 5+ messages in thread From: Jan Kratochvil @ 2011-10-19 15:38 UTC (permalink / raw) To: Mark Wielaard; +Cc: Project Archer Hi Mark, (.debug size) 5059640 = -3.71% no DW_AT_sibling stap 0m0.577s (real) gdb 0m0.234s (real) 5061160 = -3.68% DW_AT_sibling if >= 256 total children stap 0m0.572s gdb 0m0.232s 5084040 = -3.25%; DW_AT_sibling if >= 16 total children stap 0m0.545s gdb 0m0.231s 5169888 = -1.62%; DW_AT_sibling if >= 4 total children stap 0m0.540s gdb 0m0.235s 5254792 = ------; all DW_AT_sibling stap 0m0.536s gdb 0m0.243s So the stap vs. gdb performance is exactly the opposite. I will redo the timings after another change (DW_FORM_ref_udata) which may yet change the timing / magic threshold. Thanks, Jan gcc/ 2011-10-19 Jan Kratochvil <jan.kratochvil@redhat.com> * dwarf2out.c (add_sibling_attributes): New variables next_die_no and this_die_no. Do not produce DW_AT_sibling for too few children and no -gstruct-dwarf. --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -7490,14 +7490,21 @@ static void add_sibling_attributes (dw_die_ref die) { dw_die_ref c; + static unsigned long next_die_no; + unsigned long this_die_no = next_die_no++; if (! die->die_child) return; + FOR_EACH_CHILD (die, c, add_sibling_attributes (c)); + + /* -gstruct-dwarf can be used for unconditional DW_AT_sibling for backward + compatibility. */ + if (!dwarf_strict && next_die_no - this_die_no < 16) + return; + if (die->die_parent && die != die->die_parent->die_child) add_AT_die_ref (die, DW_AT_sibling, die->die_sib); - - FOR_EACH_CHILD (die, c, add_sibling_attributes (c)); } /* Output all location lists for the DIE and its children. */ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-10-19 15:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <201110170508.p9H581vh028090@shell.devel.redhat.com> [not found] ` <20111017133634.GA5677@host1.jankratochvil.net> [not found] ` <1318929963.8669.2.camel@springer.wildebeest.org> 2011-10-18 9:45 ` gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling Jan Kratochvil 2011-10-18 9:49 ` Jan Kratochvil 2011-10-19 9:34 ` Mark Wielaard 2011-10-19 11:29 ` Jan Kratochvil 2011-10-19 15:38 ` Jan Kratochvil
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).