public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* OProfile examination of  where SystemTap spends time
@ 2006-02-06 16:34 William Cohen
  0 siblings, 0 replies; only message in thread
From: William Cohen @ 2006-02-06 16:34 UTC (permalink / raw)
  To: SystemTAP

Last Thursday I did a quick experiment on a AMD64 machine running
FC5t2 to see where time is spent when running the testsuite. I have a
script oprofdata-collect.sh that setups up oprofile for the
experiment. Assuming that the processor is running 2GHz, the sampling
to take 2,000 samples per second on the AMD64; the power now throttles
down to 800MHz when the processor is idle.

Below is the overall view of samples for everything with 1% or more of
the samples for a "runtest --tool=systemtap":


$ opreport session:data.5839f --long-filenames -t 1
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 1000000
CPU_CLK_UNHALT...|
   samples|      %|
------------------
    175290 54.9835 /usr/libexec/gcc/x86_64-redhat-linux/4.1.0/cc1
         CPU_CLK_UNHALT...|
           samples|      %|
         ------------------
            156557 89.3131 /usr/libexec/gcc/x86_64-redhat-linux/4.1.0/cc1
             18701 10.6686 /lib64/libc-2.3.90.so
                32  0.0183 /lib64/ld-2.3.90.so
     70077 21.9811 /usr/lib/debug/lib/modules/2.6.15-1.1895_FC5/vmlinux
     29667  9.3057 /home/wcohen/stap_testing_200602022040/obj/stap
         CPU_CLK_UNHALT...|
           samples|      %|
         ------------------
             14487 48.8320 /home/wcohen/stap_testing_200602022040/obj/stap
              7523 25.3581 /usr/lib64/libstdc++.so.6.0.7
              4577 15.4279 
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so
              2721  9.1718 /lib64/libc-2.3.90.so
               203  0.6843 /lib64/ld-2.3.90.so
               127  0.4281 
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libelf-0.119.so
                28  0.0944 
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/systemtap/libebl_x86_64-0.119.so
                 1  0.0034 /lib64/libdl-2.3.90.so
     24681  7.7417 /usr/bin/as
         CPU_CLK_UNHALT...|
           samples|      %|
         ------------------
             17890 72.4849 /usr/bin/as
              5665 22.9529 /lib64/libc-2.3.90.so
              1110  4.4974 /usr/lib64/libbfd-2.16.91.0.5.so
                16  0.0648 /lib64/ld-2.3.90.so
      5269  1.6527 
/usr/src/kernels/2.6.15-1.1895_FC5-x86_64/scripts/mod/modpost
         CPU_CLK_UNHALT...|
           samples|      %|
         ------------------
              2719 51.6037 
/usr/src/kernels/2.6.15-1.1895_FC5-x86_64/scripts/mod/modpost
              2531 48.0357 /lib64/libc-2.3.90.so
                19  0.3606 /lib64/ld-2.3.90.so


A significant amount of time is spent in the traditional compilation
process converting the C code into exectuable, 55% in cc1 (C to
assembly language conversion) and 7.7% in as (assembly to object
code). The kernel is taking about 22%. The systemtap translator is
taking 9.3%.


The systemtap translator doesn't have a huge hotspot as seen
below. The highest thing on the list is the string compare at
5.6%. The elfutils routines to figure out where to place probe points
are next.

$ opreport session:data.5839f --long-filenames -l 
/home/wcohen/stap_testing_200602022040/obj/stap -t 2
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 1000000
samples  %        image name               symbol name
1660      5.5954  /usr/lib64/libstdc++.so.6.0.7 
std::string::compare(char const*) const
1209      4.0752 
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so 
lookup
1101      3.7112 
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so 
__libdw_find_attr
947       3.1921  /home/wcohen/stap_testing_200602022040/obj/stap 
lexer::input_peek(unsigned int)
837       2.8213  /lib64/libc-2.3.90.so    _int_malloc
836       2.8179  /home/wcohen/stap_testing_200602022040/obj/stap 
lexer::scan()
791       2.6663  /home/wcohen/stap_testing_200602022040/obj/stap 
__gnu_cxx::__normal_iterator<int const*, std::vector<int, 
std::allocator<int> > >::__normal_iterator(int const* const&)
728       2.4539  /home/wcohen/stap_testing_200602022040/obj/stap 
__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> 
 > >::__normal_iterator(int* const&)
617       2.0798  /home/wcohen/stap_testing_200602022040/obj/stap 
std::vector<int, std::allocator<int> 
 >::erase(__gnu_cxx::__normal_iterator<int*, std::vector<int, 
std::allocator<int> > >)
609       2.0528  /lib64/libc-2.3.90.so    memcpy
597       2.0123  /home/wcohen/stap_testing_200602022040/obj/stap 
lexer::input_get()
594       2.0022  /home/wcohen/stap_testing_200602022040/obj/stap 
std::vector<int, std::allocator<int> >::size() const


The kernel being used has debugging turned on. One can see that the
check_poison_obj at the top of the list with 17.8%. It looks like stap
is exercising the paging mechanism with the number of samples in
clear_page, page_fault, and __handle_mm_fault.

$ opreport session:data.5839f --long-filenames -t 1 -l 
/usr/lib/debug/lib/modules/2.6.15-1.1895_FC5/vmlinux
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 1000000
samples  %        symbol name
12468    17.7919  check_poison_obj
12458    17.7776  clear_page
2520      3.5960  page_fault
2469      3.5233  __handle_mm_fault
2397      3.4205  number
2260      3.2250  _raw_spin_lock
2091      2.9839  copy_user_generic
1227      1.7509  _raw_spin_unlock
1139      1.6254  memset
1011      1.4427  do_page_fault
977       1.3942  get_page_from_freelist
862       1.2301  strncpy
789       1.1259  vsnprintf

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2006-02-06 16:34 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-06 16:34 OProfile examination of where SystemTap spends time William Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).