public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
@ 2009-09-20 16:52 ` mjw at redhat dot com
  2009-09-20 18:45 ` mjw at redhat dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2009-09-20 16:52 UTC (permalink / raw)
  To: systemtap



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|semok/twenty.stp takes 15   |semok/twenty.stp takes 15
                   |minutes on                  |minutes on slow/low-memorry
                   |                            |machine


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on
@ 2009-09-20 16:52 mjw at redhat dot com
  2009-09-20 16:52 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine mjw at redhat dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2009-09-20 16:52 UTC (permalink / raw)
  To: systemtap

The semok/twenty.stp test takes 15 minutes on my slow laptop. The test simply
consists of all dwarfable probes for the kernel and all modules:

probe kernel.function("*") {}
probe module("*").function("*") {}
probe kernel.function("*").call {}
probe module("*").function("*").call {}
probe kernel.function("*").return {}
probe module("*").function("*").return {}
probe kernel.function("*").inline {}
probe module("*").function("*").inline {}

It looks like on this system processing of almost 2000 modules takes most of
that time (the machine has a slow disk and "only" 1.5GB of memory).

-- 
           Summary: semok/twenty.stp takes 15 minutes on
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: testsuite
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: mjw at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
  2009-09-20 16:52 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine mjw at redhat dot com
@ 2009-09-20 18:45 ` mjw at redhat dot com
  2009-09-20 23:33 ` fche at redhat dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2009-09-20 18:45 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2009-09-20 18:45 -------
Some seconds (less than 30) can be won with the following patch for elfutils:
https://fedorahosted.org/pipermail/elfutils-devel/2009-September/000678.html
Still 14.5 minutes to go...

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
  2009-09-20 16:52 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine mjw at redhat dot com
  2009-09-20 18:45 ` mjw at redhat dot com
@ 2009-09-20 23:33 ` fche at redhat dot com
  2009-09-21  0:01 ` fche at redhat dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2009-09-20 23:33 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2009-09-20 23:33 -------
One possibility for pass-2 speedup is to clear out the various dwarf data 
periodically by calling build_no_more if we detect we are paging.

On my tests, on a well-RAM'd machine, the bulk of the time is in fact in
pass 3 - the unwindsyms data dumping.  That shouldn't be.

/usr/bin/time ./stap -r 2.6.30.5-43.fc11.x86_64 -e 'probe
module("*").function("*") {}' -t -p4 -v

Pass 1: parsed user script and 59 library script(s) in 100usr/10sys/181real ms.
Pass 2: analyzed script: 269484 probe(s), 0 function(s), 0 embed(s), 0 global(s)
in 8740usr/2800sys/62297real ms.
Pass 3: translated to C into
"/var/tmp/stapjsYUU9/stap_8924a4d5711146589d48a4be4e629fe1_39304877.c" in
396730usr/5350sys/746939real ms.
Pass 4: compiled C into "stap_8924a4d5711146589d48a4be4e629fe1_39304877.ko" in
28320usr/3590sys/46712real ms.
434.92user 12.29system 14:19.43elapsed 52%CPU (0avgtext+0avgdata 0maxresident)k
3043496inputs+854280outputs (22244major+1375284minor)pagefaults 0swaps


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
                   ` (2 preceding siblings ...)
  2009-09-20 23:33 ` fche at redhat dot com
@ 2009-09-21  0:01 ` fche at redhat dot com
  2009-09-22  9:39 ` mjw at redhat dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2009-09-21  0:01 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2009-09-21 00:00 -------
After this patch:

diff --git a/translate.cxx b/translate.cxx
index 04a9247..c304178 100644
--- a/translate.cxx
+++ b/translate.cxx
@@ -4482,7 +4482,7 @@ static void get_unwind_data (Dwfl_Module *m,
 
   // fetch .eh_frame info preferably from main elf file.
   dwfl_module_info (m, NULL, &start, NULL, NULL, NULL, NULL, NULL);
-  elf = dwfl_module_getelf(m, &bias);
+  elf = (dwarf_getelf (dwfl_module_getdwarf (m, &bias)) ?: dwfl_module_getelf
(m, &bias));
   ehdr = gelf_getehdr(elf, &ehdr_mem);
   scn = NULL;
   while ((scn = elf_nextscn(elf, scn)))

the situation improves somewhat.  Then, random SIGINT/backtraces appear to
implicate translate.cxx:4740, the call to dwfl_module_address_section().

Program received signal SIGINT, Interrupt.
0x00002baa1f167330 in gelf_getsymshndx@plt ()
   from
/home/fche/Private/DEVEL/DEVEL-systemtap/git/BUILD2/../INST2/lib/systemtap/libdw.so.1
(gdb) bt
#0  0x00002baa1f167330 in gelf_getsymshndx@plt ()
   from
/home/fche/Private/DEVEL/DEVEL-systemtap/git/BUILD2/../INST2/lib/systemtap/libdw.so.1
#1  0x00002baa1f1721f0 in resolve_symbol (shndx=2, sym=0x7fffdc87d5d0,
symtab=<value optimized out>, 
    referer=<value optimized out>) at
/home/fche/Private/DEVEL/DEVEL-systemtap/elfutils-0.141/libdwfl/relocate.c:246
#2  relocate (shndx=2, sym=0x7fffdc87d5d0, symtab=<value optimized out>,
referer=<value optimized out>)
    at
/home/fche/Private/DEVEL/DEVEL-systemtap/elfutils-0.141/libdwfl/relocate.c:348
#3  0x00002baa1f1727a8 in relocate_section (ehdr=<value optimized out>,
shstrndx=<value optimized out>, 
    reloc_symtab=<value optimized out>, scn=<value optimized out>,
shdr=0x7fffdc87d7a0, tscn=0x17856350, debugscn=false, 
    partial=true) at
/home/fche/Private/DEVEL/DEVEL-systemtap/elfutils-0.141/libdwfl/relocate.c:501
#4  0x00002baa1f172cd2 in __libdwfl_relocate_section (mod=0xc7ace60,
relocated=0x17856100, relocscn=0x17856408, 
    tscn=0x17856350, partial=true) at
/home/fche/Private/DEVEL/DEVEL-systemtap/elfutils-0.141/libdwfl/relocate.c:633
#5  0x00002baa1f173b5c in dwfl_module_address_section (mod=0xc7ace60,
address=<value optimized out>, bias=0x7fffdc87dc78)
    at
/home/fche/Private/DEVEL/DEVEL-systemtap/elfutils-0.141/libdwfl/derelocate.c:398
#6  0x00000000004633b3 in dump_unwindsyms (m=0xc7ace60, userdata=<value
optimized out>, name=<value optimized out>, 
    base=<value optimized out>, arg=<value optimized out>) at
../systemtap2/translate.cxx:4740


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
                   ` (3 preceding siblings ...)
  2009-09-21  0:01 ` fche at redhat dot com
@ 2009-09-22  9:39 ` mjw at redhat dot com
  2009-09-22  9:57 ` mjw at redhat dot com
  2009-09-22 10:10 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memory machine mjw at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2009-09-22  9:39 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2009-09-22 09:39 -------
(In reply to comment #2)
> One possibility for pass-2 speedup is to clear out the various dwarf data 
> periodically by calling build_no_more if we detect we are paging.

You mean by querying getrusage() and seeing if we exceed something like
getpagesize() * get_phys_pages() (or maybe get_avphys_pages()) in some strategic
place?

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
                   ` (4 preceding siblings ...)
  2009-09-22  9:39 ` mjw at redhat dot com
@ 2009-09-22  9:57 ` mjw at redhat dot com
  2009-09-22 10:10 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memory machine mjw at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2009-09-22  9:57 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2009-09-22 09:56 -------
(In reply to comment #3)
> After this patch:
> 
> diff --git a/translate.cxx b/translate.cxx
> index 04a9247..c304178 100644
> --- a/translate.cxx
> +++ b/translate.cxx
> @@ -4482,7 +4482,7 @@ static void get_unwind_data (Dwfl_Module *m,
>  
>    // fetch .eh_frame info preferably from main elf file.
>    dwfl_module_info (m, NULL, &start, NULL, NULL, NULL, NULL, NULL);
> -  elf = dwfl_module_getelf(m, &bias);
> +  elf = (dwarf_getelf (dwfl_module_getdwarf (m, &bias)) ?: dwfl_module_getelf
> (m, &bias));
>    ehdr = gelf_getehdr(elf, &ehdr_mem);
>    scn = NULL;
>    while ((scn = elf_nextscn(elf, scn)))
> 
> the situation improves somewhat.

Unfortunately that doesn't work, since the dwarf elf file in a split debuginfo
scenario doesn't actually contain the actual .eh_frame section bits (the section
is there, but it is marked NO_BITS in that case). We really need the bits from
the main elf file .eh_frame section. exelib.exp has a test for this split
debuginfo case.

I don't believe .eh_frame sections contain any relocatable addresses, so it is
somewhat unfortunate that libdwfl decides to try to relocate things anyway.
Should we maybe open the main elf file directly ourselves, bypassing
dwfl_module_getelf()?

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memory machine
  2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
                   ` (5 preceding siblings ...)
  2009-09-22  9:57 ` mjw at redhat dot com
@ 2009-09-22 10:10 ` mjw at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2009-09-22 10:10 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2009-09-22 10:09 -------
(In reply to comment #3)
> Then, random SIGINT/backtraces appear to
> implicate translate.cxx:4740, the call to dwfl_module_address_section().

Yes, this is another case where libdwfl decides to start relocating things just
because we want to get at the section header. In this case we only need the
shdr->sh_size, not the actual data. But there seems to be no way to get that
information, through libdwfl, without triggering a full relocation of the elf file.

This is actually part of a long standing discussion between Roland and myself:
http://sourceware.org/ml/systemtap/2008-q4/msg00597.html
The basic idea is that we try to build up a search table for addresses that can
be mapped to modules, and for that we want to know the size of the module.
Roland feels this should be done differently/isn't needed. But I don't know how
else we can get this working correctly without knowing the module sizes. Note
that it could very well be that the confusion is amplified by the fact that
previous versions of elfutils had some bugs in dwfl_module_relocate_address().
The thread referenced above is mainly about working around that bug.

IMHO, like in the other case, it would be nice if libdwfl would just let us get
at the "raw" elf section header/data without feeling the need to relocate it all
unless we explicitly request it does so.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|semok/twenty.stp takes 15   |semok/twenty.stp takes 15
                   |minutes on slow/low-memorry |minutes on slow/low-memory
                   |machine                     |machine


http://sourceware.org/bugzilla/show_bug.cgi?id=10671

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-09-22 10:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-20 16:52 [Bug testsuite/10671] New: semok/twenty.stp takes 15 minutes on mjw at redhat dot com
2009-09-20 16:52 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memorry machine mjw at redhat dot com
2009-09-20 18:45 ` mjw at redhat dot com
2009-09-20 23:33 ` fche at redhat dot com
2009-09-21  0:01 ` fche at redhat dot com
2009-09-22  9:39 ` mjw at redhat dot com
2009-09-22  9:57 ` mjw at redhat dot com
2009-09-22 10:10 ` [Bug testsuite/10671] semok/twenty.stp takes 15 minutes on slow/low-memory machine mjw at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).