public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Where stap spends its time producing instrumentation for testsuite/systemtap.syscall/sys.stp
@ 2007-10-03 19:02 William Cohen
  2007-10-04  3:31 ` Frank Ch. Eigler
  0 siblings, 1 reply; 3+ messages in thread
From: William Cohen @ 2007-10-03 19:02 UTC (permalink / raw)
  To: SystemTAP

One of the complaints has been about the amount of time and space
required for the creation of systemtap instrumentation. I decided to
get a concrete data point on this for the
testsuite/systemtap.syscall/sys.stp. This is not an exhaustive
analysis, but a quick experiment using current a current SystemTap
snapshot with locally built copy of elfutils 0.129. This is on 2GHz
86-64 machine with 1GB ram running 2.6.23-0.214.rc8.git2.fc8 kernel
and snapshot of fedora in development. Hopefully, it will give us some
idea of where improvements can be made.

-Will


Set up oprofile to measure things:


$ sudo opcontrol --setup --vmlinux=/usr/lib/debug/lib/modules/`uname -r`/vmlinux 
--separate=library


Below is the the experiment to see how long it takes to build the
instrumentation with oprofile collecting data. The analysis pass (pass
2) and compilation pass take a over 30 seconds each.


$  sudo opcontrol --start; ../../../install/bin/stap -p4 -v sys.stp -m xyz -k; 
sudo opcontrol --shutdown
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.
Warning: using '-m' disables cache support.
Pass 1: parsed user script and 37 library script(s) in 530usr/20sys/626real ms.
Pass 2: analyzed script: 768 probe(s), 108 function(s), 14 embed(s), 4 global(s) 
in 39690usr/360sys/40385real ms.
Pass 3: translated to C into "/tmp/stapnoy9IG/xyz.c" in 300usr/110sys/462real ms.

Pass 4: compiled C into "xyz.ko" in 34080usr/1500sys/36972real ms.
Keeping temporary directory "/tmp/stapnoy9IG"
Stopping profiling.
Killing daemon.


Quick look at the size of the generated module source code and object code.
Size of generated code, about 120K lines of code generated by
translator, which results in 1.13MB text and 170KB of data.

$ wc /tmp/stapnoy9IG/xyz.c
  120522  352489 4295506 /tmp/stapnoy9IG/xyz.c
$ size /tmp/stapnoy9IG/xyz.ko
    text    data     bss     dec     hex filename
1134094  169408    4816 1308318  13f69e /tmp/stapnoy9IG/xyz.ko




Overall look at the data collected by oprofile below. Each sample of
CPU_CLK_UNHALTED is approximately 50ms (1,000,000 cycle
cycles/2GHz). This agrees time reported by stap for the phases 79459
samples for stap, about 39 seconds. cc1 and as have about 33 seconds
of samples between them. For stap much of the time is spent in libdw.


$ opreport -t1
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 1000000
Counted RETIRED_INSTRUCTIONS events (Retired instructions (includes exceptions, 
interrupts, re-syncs)) with a unit mask of 0x00 (No unit mask) count 1000000
CPU_CLK_UNHALT...|RETIRED_INSTRU...|
   samples|      %|  samples|      %|
------------------------------------
     79469 51.6932     90998 58.4463 stap
         CPU_CLK_UNHALT...|RETIRED_INSTRU...|
           samples|      %|  samples|      %|
         ------------------------------------
             31962 40.2195     38307 42.0965 libdw.so.1
             25285 31.8174     30188 33.1744 stap
             12849 16.1686     12348 13.5695 libc-2.6.90.so
              8149 10.2543      8118  8.9211 libstdc++.so.6.0.8
              1165  1.4660      2021  2.2209 libelf.so.1
                38  0.0478        12  0.0132 ld-2.6.90.so
                19  0.0239         2  0.0022 libpthread-2.6.90.so
                 2  0.0025         2  0.0022 libdl-2.6.90.so
     63810 41.5073     57978 37.2382 cc1
         CPU_CLK_UNHALT...|RETIRED_INSTRU...|
           samples|      %|  samples|      %|
         ------------------------------------
             58801 92.1501     53243 91.8331 cc1
              5006  7.8452      4734  8.1652 libc-2.6.90.so
                 3  0.0047         1  0.0017 ld-2.6.90.so
      6847  4.4539      3672  2.3585 vmlinux
      2779  1.8077      2424  1.5569 as
         CPU_CLK_UNHALT...|RETIRED_INSTRU...|
           samples|      %|  samples|      %|
         ------------------------------------
              1988 71.5365      1568 64.6865 as
               731 26.3044       806 33.2508 libc-2.6.90.so
                52  1.8712        49  2.0215 libbfd-2.17.50.0.18-1.so
                 8  0.2879         1  0.0413 ld-2.6.90.so


Targetted look where which functions stap is spending it's time
below. See that __libdw_find_attr and __libdw_form_val_len are the hotspot and 
consume almost 1/3 of the stap's runtime.

$ opreport  -l -t1 ../../../install/bin/stap
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 1000000
Counted RETIRED_INSTRUCTIONS events (Retired instructions (includes exceptions, 
interrupts, re-syncs)) with a unit mask of 0x00 (No unit mask) count 1000000
samples  %        samples  %        image name               symbol name
13131    16.5234  13807    15.1729  libdw.so.1               __libdw_find_attr
12639    15.9043  17175    18.8740  libdw.so.1               __libdw_form_val_len
8149     10.2543  8118      8.9211  libstdc++.so.6.0.8       (no symbols)
4545      5.7192  4278      4.7012  libc-2.6.90.so           strlen
2550      3.2088  2561      2.8143  stap 
query_dwarf_func(Dwarf_Die*, void*)
2052      2.5821  1354      1.4879  libc-2.6.90.so           malloc
1703      2.1430  3527      3.8759  libdw.so.1               dwfl_module_getsym
1676      2.1090  1534      1.6858  stap 
dwflpp::focus_on_function(Dwarf_Die*)
1575      1.9819  1965      2.1594  stap 
__gnu_cxx::__normal_iterator<Dwarf_Die const*, std::vector<Dwarf_Die, 
std::allocator<Dwarf_Die> > >::__normal_iterator(Dwarf_Die const* const&)
1551      1.9517  1920      2.1099  stap 
std::vector<Dwarf_Die, std::allocator<Dwarf_Die> >::size() const
1527      1.9215  1831      2.0121  stap 
dwflpp::iterate_over_functions(int (*)(Dwarf_Die*, void*), void*)
1243      1.5641  1788      1.9649  libc-2.6.90.so           memcpy
1190      1.4974  971       1.0671  libdw.so.1               dwarf_attr
1156      1.4547  1560      1.7143  stap 
lookup_symbol_address(Dwfl_Module*, char const*)
1135      1.4282  2015      2.2143  libelf.so.1              gelf_getsymshndx
1057      1.3301  904       0.9934  libc-2.6.90.so           _int_free
1038      1.3062  876       0.9627  libc-2.6.90.so           _int_malloc
1029      1.2948  998       1.0967  stap                     .plt
990       1.2458  1360      1.4945  stap 
__gnu_cxx::__normal_iterator<Dwarf_Die const*, std::vector<Dwarf_Die, 
std::allocator<Dwarf_Die> > >::difference_type __gnu_cxx::operator-<Dwarf_Die 
const*, Dwarf_Die const*, std::vector<Dwarf_Die, std::allocator<Dwarf_Die> > 
 >(__gnu_cxx::__normal_iterator<Dwarf_Die const*, std::vector<Dwarf_Die, 
std::allocator<Dwarf_Die> > > const&, __gnu_cxx::__normal_iterator<Dwarf_Die 
const*, std::vector<Dwarf_Die, std::allocator<Dwarf_Die> > > const&)
927       1.1665  1443      1.5857  stap 
dwflpp::func_is_inline()
914       1.1501  1093      1.2011  libc-2.6.90.so           internal_fnmatch
897       1.1287  957       1.0517  stap 
std::vector<Dwarf_Die, std::allocator<Dwarf_Die> >::at(unsigned long)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-10-04 14:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-03 19:02 Where stap spends its time producing instrumentation for testsuite/systemtap.syscall/sys.stp William Cohen
2007-10-04  3:31 ` Frank Ch. Eigler
2007-10-04 14:17   ` William Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).