public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Phase 2 takes too much time to complete
@ 2009-02-11 17:59 Roberto Natella
  2009-02-11 21:14 ` Frank Ch. Eigler
  0 siblings, 1 reply; 6+ messages in thread
From: Roberto Natella @ 2009-02-11 17:59 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]

Dear SystemTap developers,
I noticed that compile time for a non-cached module greatly increased
since SystemTap snapshot 20070721.
On my system (RHEL4 with 2.6.25 kernel; dual Intel Xeon(TM) CPU
2.80GHz with HT; 5 GB RAM), phase 2
takes about half an hour to complete. This problem also occurs in
SystemTap 0.8 and in most recent snapshots.
Using strace, I found that stap spends most of time for generating the
header "stap-symbols.h" in function
dump_unwindsyms() (in translate.cxx). Here it is the "guilty" code:

  // Add unwind data to be included if it exists for this module.
  size_t len = 0;
  void *unwind = get_unwind_data (m, &len);
  if (unwind != NULL)
    {
      c->output << "#ifdef STP_USE_DWARF_UNWINDER" << endl;
      c->output << "static uint8_t _stp_module_" << stpmod_idx
        << "_unwind_data[] = " << endl;
      c->output << "  {";
      for (size_t i = 0; i < len; i++)
    {
      int h = ((uint8_t *)unwind)[i];
      c->output << "0x" << hex << h << dec << ",";
      if ((i + 1) % 16 == 0)
        c->output << endl << "   ";
    }
      c->output << "};" << endl;
      c->output << "#endif /* STP_USE_DWARF_UNWINDER */" << endl;
    }

and

  for (unsigned secidx = 0; secidx < seclist.size(); secidx++)
    {
      c->output << "struct _stp_symbol "
                << "_stp_module_" << stpmod_idx<< "_symbols_" <<
secidx << "[] = {" << endl;

      // We write out a *sorted* symbol table, so the runtime doesn't
have to sort them later.
      for (addrmap_t::iterator it = addrmap[secidx].begin(); it !=
addrmap[secidx].end(); it++)
        {
          if (it->first < extra_offset)
            continue; // skip symbols that occur before our chosen base address

          c->output << "  { 0x" << hex << it->first-extra_offset << dec
                    << ", " << lex_cast_qstring (it->second) << " }," << endl;
        }
      c->output << "};" << endl;
    }

As far as I know, the "stap-symbols.h" file contains the list of
kernel symbols (preliminarly ordered by name)
in order to speed-up symbol resolution at run-time. Because the symbol
list remains the same if the kernel
is unchanged, I think that stap-symbols.h should be cached in order to
speed up phase 2. You can find a
very simple patch I made on SystemTap 0.8 which caches stap-symbols.h
into ~/.systemtap
(it only takes into account the modification time of
/boot/vmlinux-VERSION, so it can be greatly improved).
Please consider the addition of caching mechanisms for stap-symbols.h
in future releases of SystemTap.
Thank you for your support
Roberto Natella

[-- Attachment #2: cache_unwind_symbols.patch --]
[-- Type: application/octet-stream, Size: 1570 bytes --]

diff -uNr systemtap-0.8-old/translate.cxx systemtap-0.8-new/translate.cxx
--- systemtap-0.8-old/translate.cxx	2008-11-13 21:29:23.000000000 +0100
+++ systemtap-0.8-new/translate.cxx	2009-01-26 17:18:05.000000000 +0100
@@ -28,6 +28,10 @@
 #include <elfutils/libdwfl.h>
 }
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 using namespace std;
 
 struct var;
@@ -4645,12 +4649,39 @@
 void
 emit_symbol_data (systemtap_session& s)
 {
+  /*
   string symfile = "stap-symbols.h";
 
   s.op->newline() << "#include " << lex_cast_qstring (symfile);
 
   ofstream kallsyms_out ((s.tmpdir + "/" + symfile).c_str());
+  */
 
+  string kernel_image = string("/boot/vmlinux-")+s.kernel_release;
+  struct stat stat_kernel_image;
+  stat(kernel_image.c_str(), &stat_kernel_image);
+
+  char last_modification[40];
+  sprintf(last_modification, "%lld", (long long int) stat_kernel_image.st_mtime);
+  
+  string symfile = string("stap-symbols-") + s.kernel_release.c_str() + "-" + string(last_modification) + ".h";
+  
+  string symbols_dir = s.data_path + "/symbols";
+  
+  string symfile_path = symbols_dir + "/" + symfile;
+    
+  s.op->newline() << "#include " << lex_cast_qstring (symfile_path.c_str());
+
+  if(create_dir(symbols_dir.c_str()) == 0) {
+  	struct stat stat_cached_sym;
+
+  	if(stat(symfile_path.c_str(), &stat_cached_sym) == 0) {
+		return;
+	}
+  }
+  
+  ofstream kallsyms_out (symfile_path.c_str());
+  
   unwindsym_dump_context ctx = { s, kallsyms_out, 0, s.unwindsym_modules };
 
   // XXX: copied from tapsets.cxx dwflpp::, sadly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phase 2 takes too much time to complete
  2009-02-11 17:59 Phase 2 takes too much time to complete Roberto Natella
@ 2009-02-11 21:14 ` Frank Ch. Eigler
  2009-02-12  0:24   ` Mark Wielaard
  0 siblings, 1 reply; 6+ messages in thread
From: Frank Ch. Eigler @ 2009-02-11 21:14 UTC (permalink / raw)
  To: Roberto Natella; +Cc: systemtap


Roberto Natella <rnatella@gmail.com> writes:

> [...]  On my system (RHEL4 with 2.6.25 kernel; dual Intel Xeon(TM)
> CPU 2.80GHz with HT; 5 GB RAM), phase 2 takes about half an hour to
> complete. [...]

That's wacky!


> SystemTap 0.8 and in most recent snapshots.  Using strace, I found
> that stap spends most of time for generating the header
> "stap-symbols.h" in function dump_unwindsyms() (in translate.cxx).

There are several sources of delay in there.

First, this code re-traverses the kernel module tree looking for
.ko's, which is basically a "find /lib/modules/.../" that can chew
through a lot of inodes.  Instead, this information should be
extracted & saved the first time we go through the modules (in pass 2
rather than pass 3).

Second, this code uses "endl" a lot, which can result in premature
buffer flush syscalls.  If your strace is showing a lot of small
individual writes when producing this data, then it's worth trying to
replace '<< endl' with '<< "\n"' throughout this function.

Third ... well, let's check in with an strace or even a oprofile/gprof
data first.


> [...] Because the symbol list remains the same if the kernel is
> unchanged, I think that stap-symbols.h should be cached in order to
> speed up phase 2. You can find a very simple patch I made on
> SystemTap 0.8 which caches stap-symbols.h [...]

That's not a bad idea at all, and thanks for working out a patch to do
it, but I think we should be able to solve it somewhat differently.
(For one thing, the list of modules included in there can include
userspace binaries and different subsets of the kernel modules, so the
cache key can't be just a function of the kernel image.)


- FChE

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phase 2 takes too much time to complete
  2009-02-11 21:14 ` Frank Ch. Eigler
@ 2009-02-12  0:24   ` Mark Wielaard
  2009-02-12 14:54     ` Roberto Natella
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Wielaard @ 2009-02-12  0:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Roberto Natella, systemtap

Hi Roberto,

On Wed, 2009-02-11 at 11:31 -0500, Frank Ch. Eigler wrote:
> Roberto Natella <rnatella@gmail.com> writes:
> 
> > [...]  On my system (RHEL4 with 2.6.25 kernel; dual Intel Xeon(TM)
> > CPU 2.80GHz with HT; 5 GB RAM), phase 2 takes about half an hour to
> > complete. [...]
> 
> That's wacky!

30 minutes is indeed a little excessive.
I never seen anything take that long.
Do you have a sample stap invocation that shows this?
Are there any specifics about your kernel build that might be different
from others?

> Second, this code uses "endl" a lot, which can result in premature
> buffer flush syscalls.  If your strace is showing a lot of small
> individual writes when producing this data, then it's worth trying to
> replace '<< endl' with '<< "\n"' throughout this function.

That seems an easy change to make and might indeed make sense for the
inner loops. I'll time some things and see if it makes a difference for
me.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phase 2 takes too much time to complete
  2009-02-12  0:24   ` Mark Wielaard
@ 2009-02-12 14:54     ` Roberto Natella
  2009-02-12 19:58       ` Mark Wielaard
  0 siblings, 1 reply; 6+ messages in thread
From: Roberto Natella @ 2009-02-12 14:54 UTC (permalink / raw)
  To: systemtap

2009/2/11 Mark Wielaard <mjw@redhat.com>:
>> Second, this code uses "endl" a lot, which can result in premature
>> buffer flush syscalls.  If your strace is showing a lot of small
>> individual writes when producing this data, then it's worth trying to
>> replace '<< endl' with '<< "\n"' throughout this function.
>
> That seems an easy change to make and might indeed make sense for the
> inner loops. I'll time some things and see if it makes a difference for
> me.
>

Hi,
in the previous mail, I mistakenly wrote "phase 2" instead of "phase
3". Compile time clearly improved
after substitution of "endl" with "\n" (2552835 vs 119837 real ms).
Moreover, it seems that caching
stap-symbols.h does not significantly affect compile time (120789 real ms).
The results are:

WITH "endl":
$ stap -c df -e 'probe syscall.* { if (target()==pid()) log(name."
".argstr) }' -v
Pass 1: parsed user script and 47 library script(s) in 500usr/10sys/555real ms.
Pass 2: analyzed script: 317 probe(s), 120 function(s), 14 embed(s), 1
global(s) in 5570usr/2160sys/10397real ms.
Pass 3: translated to C into
"/tmp/stapLcLRff/stap_1a0900c4b393841ad08f615f866d76cd_206544.c" in
1190usr/5260sys/2552835real ms.
Pass 4: compiled C into
"stap_1a0900c4b393841ad08f615f866d76cd_206544.ko" in
13680usr/1830sys/63733real ms.
Pass 5: starting run.

WITH "\n":
$ stap -c df -e 'probe syscall.* { if (target()==pid()) log(name."
".argstr) }' -v
Pass 1: parsed user script and 47 library script(s) in 480usr/20sys/531real ms.
Pass 2: analyzed script: 317 probe(s), 120 function(s), 14 embed(s), 1
global(s) in 5640usr/2450sys/12262real ms.
Pass 3: translated to C into
"/tmp/stapRobpPy/stap_0659fff5e2f372ad6e3c2e0101921c0a_206861.c" in
230usr/180sys/119837real ms.
Pass 4: compiled C into
"stap_0659fff5e2f372ad6e3c2e0101921c0a_206861.ko" in
13420usr/1850sys/63929real ms.
Pass 5: starting run.

WITH stap-symbols.h CACHED:
$ stap -c df -e 'probe syscall.* { if (target()==pid()) log(name."
".argstr) }' -v
Pass 1: parsed user script and 47 library script(s) in 480usr/30sys/528real ms.
Pass 2: analyzed script: 317 probe(s), 120 function(s), 14 embed(s), 1
global(s) in 5690usr/2380sys/12149real ms.
Pass 3: translated to C into
"/tmp/stapzgskoZ/stap_14f519a25c02081dbd15de7aed797287_207178.c" in
160usr/180sys/120789real ms.
Pass 4: compiled C into
"stap_14f519a25c02081dbd15de7aed797287_207178.ko" in
13380usr/2030sys/62858real ms.
Pass 5: starting run.

Thank you for your suggestion
Roberto Natella

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phase 2 takes too much time to complete
  2009-02-12 14:54     ` Roberto Natella
@ 2009-02-12 19:58       ` Mark Wielaard
  2009-02-13 10:24         ` Roberto Natella
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Wielaard @ 2009-02-12 19:58 UTC (permalink / raw)
  To: Roberto Natella; +Cc: systemtap

Hi Roberto,

On Thu, 2009-02-12 at 15:24 +0100, Roberto Natella wrote:
> Compile time clearly improved
> after substitution of "endl" with "\n" (2552835 vs 119837 real ms).

Thanks a lot for testing. I wasn't seeing any real improvements myself.
But Frank suggested I look at the number of writes to see if there was
some difference. And there was!
So using stap on itself (using -k to bypass the cache):

WITH endl:
$ stap -e 'global writes=0; probe syscall.write { if (pid() == target())
writes++ } probe end { printf("writes: %d\n", writes) }' -c 'stap -k -e
"probe syscall.* {exit()}"'
Keeping temporary directory "/tmp/stapzNwlc9"
writes: 65279

WITH "\n":
$ stap -e 'global writes=0; probe syscall.write { if (pid() == target())
writes++ } probe end { printf("writes: %d\n", writes) }' -c 'stap -k -e
"probe syscall.* {exit()}"'
Keeping temporary directory "/tmp/stapsmJRtU"
writes: 80

(Both repeated 3 times with always the exact same numbers)

So that is definitely part of the slowdown.
I committed the change to git.

I am still somewhat puzzled by the huge real time you are seeing. I
cannot explain that. One thing that we were looking at was memory usage
(we should do something more clever with scanning the kernel modules and
not keep them all in memory). But if you have enough memory that
shouldn't really matter. Is your machine tight on memory?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phase 2 takes too much time to complete
  2009-02-12 19:58       ` Mark Wielaard
@ 2009-02-13 10:24         ` Roberto Natella
  0 siblings, 0 replies; 6+ messages in thread
From: Roberto Natella @ 2009-02-13 10:24 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap

2009/2/12 Mark Wielaard <mjw@redhat.com>:
> I am still somewhat puzzled by the huge real time you are seeing. I
> cannot explain that. One thing that we were looking at was memory usage
> (we should do something more clever with scanning the kernel modules and
> not keep them all in memory). But if you have enough memory that
> shouldn't really matter. Is your machine tight on memory?

I don't think so, because the machine has almost 5 GB of physical
memory. The problem
may occur because of the custom kernel I compiled (I enabled several kernel
hacking options such as spinlock tracing, latencytop, etc.), so each
read operation
may incur in a fixed delay.
Best regards
Roberto

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-02-12 16:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-11 17:59 Phase 2 takes too much time to complete Roberto Natella
2009-02-11 21:14 ` Frank Ch. Eigler
2009-02-12  0:24   ` Mark Wielaard
2009-02-12 14:54     ` Roberto Natella
2009-02-12 19:58       ` Mark Wielaard
2009-02-13 10:24         ` Roberto Natella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).