public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
@ 2010-04-19 18:36 mjw at redhat dot com
  2010-04-19 19:50 ` [Bug testsuite/11514] " fche at redhat dot com
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-19 18:36 UTC (permalink / raw)
  To: systemtap

Host: Linux springer.wildebeest.org 2.6.32.11-99.fc12.x86_64 #1 SMP Mon Apr 5 19
:59:38 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
Snapshot: version 1.2/0.145 commit release-1.2-59-gfb9221e
GCC: 4.4.3 [gcc (GCC) 4.4.3 20100316 (Red Hat 4.4.3-10)]
Distro: Fedora release 12 (Constantine)

testcase /home/mark/src/systemtap/testsuite/systemtap.syscall/nd_syscall.exp
completed in 949 seconds

Versus

testcase /home/mark/src/systemtap/testsuite/systemtap.syscall/syscall.exp
completed in 84 seconds

Since they are supposed to do somewhat identical probing, this is odd.
And 15 minutes for a testcase is really too long.

-- 
           Summary: nd_syscall.exp takes more than 10 times longer than
                    syscall.exp testcase
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: testsuite
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: mjw at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
@ 2010-04-19 19:50 ` fche at redhat dot com
  2010-04-19 20:50 ` jistone at redhat dot com
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2010-04-19 19:50 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2010-04-19 12:45 -------
Quite possibly we need a caching facility for @cast().


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
  2010-04-19 19:50 ` [Bug testsuite/11514] " fche at redhat dot com
@ 2010-04-19 20:50 ` jistone at redhat dot com
  2010-04-20  3:07 ` mjw at redhat dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: jistone at redhat dot com @ 2010-04-19 20:50 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jistone at redhat dot com  2010-04-19 17:16 -------
(In reply to comment #1)
> Quite possibly we need a caching facility for @cast().

The generated modules for @cast have always been cached.  Is there a reason that
you suspect @cast is the slowdown?  Do we even know that the extra time is in
pass-2?

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
  2010-04-19 19:50 ` [Bug testsuite/11514] " fche at redhat dot com
  2010-04-19 20:50 ` jistone at redhat dot com
@ 2010-04-20  3:07 ` mjw at redhat dot com
  2010-04-20  7:59   ` Jim Keniston
  2010-04-20 12:24 ` jistone at redhat dot com
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-20  3:07 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-19 17:56 -------
Time seems to be spend in Pass 5?

$ gcc /home/mark/src/systemtap/testsuite/systemtap.syscall/clock.c   -lm   -o
/tmp/clock

$ ./run-stap -k -v testsuite/systemtap.syscall/sys.stp -c /tmp/clock
Pass 1: parsed user script and 70 library script(s) using
21108virt/13220res/2052shr kb, in 230usr/10sys/232real ms.
Pass 2: analyzed script: 641 probe(s), 123 function(s), 22 embed(s), 4 global(s)
using 189788virt/48040res/3100shr kb, in 1230usr/360sys/1601real ms.
Pass 3: translated to C into "/tmp/stapDAIhAi/stap_10047.c" using
189508virt/51668res/6840shr kb, in 1460usr/200sys/1682real ms.
Pass 4: compiled C into "stap_10047.ko" in 24680usr/900sys/25350real ms.
Pass 5: starting run.
[...]
Pass 5: run completed in 30usr/80sys/1135real ms.
Keeping temporary directory "/tmp/stapDAIhAi"

Versus:

$ ./run-stap -k -v testsuite/systemtap.syscall/nd_sys.stp -c /tmp/clock
Pass 1: parsed user script and 70 library script(s) using
21112virt/13220res/2052shr kb, in 220usr/10sys/231real ms.
Pass 2: analyzed script: 1401 probe(s), 97 function(s), 22 embed(s), 4 global(s)
using 161124virt/19324res/2940shr kb, in 370usr/340sys/711real ms.
Pass 3: translated to C into "/tmp/stapm1mCJz/stap_9513.c" using
161256virt/19624res/3148shr kb, in 320usr/50sys/366real ms.
Pass 4: compiled C into "stap_9513.ko" in 27210usr/890sys/28005real ms.
Pass 5: starting run.
[...]
Pass 5: run completed in 30usr/22210sys/23274real ms.
Keeping temporary directory "/tmp/stapm1mCJz"

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug testsuite/11514] nd_syscall.exp takes more than 10 times  longer than syscall.exp testcase
  2010-04-20  3:07 ` mjw at redhat dot com
@ 2010-04-20  7:59   ` Jim Keniston
  0 siblings, 0 replies; 17+ messages in thread
From: Jim Keniston @ 2010-04-20  7:59 UTC (permalink / raw)
  To: sourceware-bugzilla; +Cc: systemtap

On Mon, 2010-04-19 at 17:56 +0000, mjw at redhat dot com wrote:
> ------- Additional Comments From mjw at redhat dot com  2010-04-19 17:56 -------
> Time seems to be spend in Pass 5?
> 
> $ gcc /home/mark/src/systemtap/testsuite/systemtap.syscall/clock.c   -lm   -o
> /tmp/clock
> 
> $ ./run-stap -k -v testsuite/systemtap.syscall/sys.stp -c /tmp/clock
> Pass 1: parsed user script and 70 library script(s) using
> 21108virt/13220res/2052shr kb, in 230usr/10sys/232real ms.
> Pass 2: analyzed script: 641 probe(s), 123 function(s), 22 embed(s), 4 global(s)
> using 189788virt/48040res/3100shr kb, in 1230usr/360sys/1601real ms.
> Pass 3: translated to C into "/tmp/stapDAIhAi/stap_10047.c" using
> 189508virt/51668res/6840shr kb, in 1460usr/200sys/1682real ms.
> Pass 4: compiled C into "stap_10047.ko" in 24680usr/900sys/25350real ms.
> Pass 5: starting run.
> [...]
> Pass 5: run completed in 30usr/80sys/1135real ms.
> Keeping temporary directory "/tmp/stapDAIhAi"
> 
> Versus:
> 
> $ ./run-stap -k -v testsuite/systemtap.syscall/nd_sys.stp -c /tmp/clock
> Pass 1: parsed user script and 70 library script(s) using
> 21112virt/13220res/2052shr kb, in 220usr/10sys/231real ms.
> Pass 2: analyzed script: 1401 probe(s), 97 function(s), 22 embed(s), 4 global(s)
> using 161124virt/19324res/2940shr kb, in 370usr/340sys/711real ms.
> Pass 3: translated to C into "/tmp/stapm1mCJz/stap_9513.c" using
> 161256virt/19624res/3148shr kb, in 320usr/50sys/366real ms.
> Pass 4: compiled C into "stap_9513.ko" in 27210usr/890sys/28005real ms.
> Pass 5: starting run.
> [...]
> Pass 5: run completed in 30usr/22210sys/23274real ms.
> Keeping temporary directory "/tmp/stapm1mCJz"
> 

The nd_syscalls*.stp tapsets use the *_arg functions (in
tapset/*/registers.stp) to obtain the arg values.  I have no reason to
think that they're particularly slow, but I don't think I ever
benchmarked them.

Jim


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (2 preceding siblings ...)
  2010-04-20  3:07 ` mjw at redhat dot com
@ 2010-04-20 12:24 ` jistone at redhat dot com
  2010-04-20 14:01 ` jistone at redhat dot com
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: jistone at redhat dot com @ 2010-04-20 12:24 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jkenisto at linux dot vnet dot ibm dot com  2010-04-19 18:36 -------
Subject: Re:  nd_syscall.exp takes more than 10 times
 longer than syscall.exp testcase

On Mon, 2010-04-19 at 17:56 +0000, mjw at redhat dot com wrote:
> ------- Additional Comments From mjw at redhat dot com  2010-04-19 17:56 -------
> Time seems to be spend in Pass 5?
> 
> $ gcc /home/mark/src/systemtap/testsuite/systemtap.syscall/clock.c   -lm   -o
> /tmp/clock
> 
> $ ./run-stap -k -v testsuite/systemtap.syscall/sys.stp -c /tmp/clock
> Pass 1: parsed user script and 70 library script(s) using
> 21108virt/13220res/2052shr kb, in 230usr/10sys/232real ms.
> Pass 2: analyzed script: 641 probe(s), 123 function(s), 22 embed(s), 4 global(s)
> using 189788virt/48040res/3100shr kb, in 1230usr/360sys/1601real ms.
> Pass 3: translated to C into "/tmp/stapDAIhAi/stap_10047.c" using
> 189508virt/51668res/6840shr kb, in 1460usr/200sys/1682real ms.
> Pass 4: compiled C into "stap_10047.ko" in 24680usr/900sys/25350real ms.
> Pass 5: starting run.
> [...]
> Pass 5: run completed in 30usr/80sys/1135real ms.
> Keeping temporary directory "/tmp/stapDAIhAi"
> 
> Versus:
> 
> $ ./run-stap -k -v testsuite/systemtap.syscall/nd_sys.stp -c /tmp/clock
> Pass 1: parsed user script and 70 library script(s) using
> 21112virt/13220res/2052shr kb, in 220usr/10sys/231real ms.
> Pass 2: analyzed script: 1401 probe(s), 97 function(s), 22 embed(s), 4 global(s)
> using 161124virt/19324res/2940shr kb, in 370usr/340sys/711real ms.
> Pass 3: translated to C into "/tmp/stapm1mCJz/stap_9513.c" using
> 161256virt/19624res/3148shr kb, in 320usr/50sys/366real ms.
> Pass 4: compiled C into "stap_9513.ko" in 27210usr/890sys/28005real ms.
> Pass 5: starting run.
> [...]
> Pass 5: run completed in 30usr/22210sys/23274real ms.
> Keeping temporary directory "/tmp/stapm1mCJz"
> 

The nd_syscalls*.stp tapsets use the *_arg functions (in
tapset/*/registers.stp) to obtain the arg values.  I have no reason to
think that they're particularly slow, but I don't think I ever
benchmarked them.

Jim



------- Additional Comments From jistone at redhat dot com  2010-04-19 20:33 -------
I believe this is because kprobes is doing a *linear* search through kallsyms
when registering by symbol name, and we have 1400 probes to register.

Here are the top-5 perf results for staprun on this script: 
  probe nd_syscall.** {} probe begin { exit() }

# Samples: 137286
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......
#
    90.20%   stapio  [kernel]       [k] kallsyms_expand_symbol
     4.12%   stapio  [kernel]       [k] strcmp
     3.15%   stapio  [kernel]       [k] kallsyms_lookup_name
     0.56%   stapio  [kernel]       [k] read_hpet
     0.40%   stapio  [kernel]       [k] mod_find_symname


For comparison, here's with normal syscalls:
  probe syscall.** {} probe begin { exit() }

# Samples: 1051
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......
#
     2.38%   stapio  [kernel]       [k] __link_path_walk
     2.38%   stapio  [kernel]       [k] _spin_lock
     2.28%   stapio  [kernel]       [k] clear_page_c
     2.00%   stapio  /lib64/libc-2.11.1.so  [.] _IO_vfscanf_internal
     1.81%   stapio  [kernel]       [k] copy_user_generic_string


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (3 preceding siblings ...)
  2010-04-20 12:24 ` jistone at redhat dot com
@ 2010-04-20 14:01 ` jistone at redhat dot com
  2010-04-20 17:28 ` mjw at redhat dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: jistone at redhat dot com @ 2010-04-20 14:01 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jistone at redhat dot com  2010-04-19 20:50 -------
>   probe nd_syscall.** {} probe begin { exit() }
> 
> # Samples: 137286

>   probe syscall.** {} probe begin { exit() }
> 
> # Samples: 1051

I should point out that this comparison isn't completely fair, as there are 1400
nd_syscall.** probepoints and only 794 syscall.** probepoints.  That leaves 606
nd_syscalls that will fail (mostly the "SyS_*" variants), and nonexistence is
the worst-case scenario for linear searches.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (4 preceding siblings ...)
  2010-04-20 14:01 ` jistone at redhat dot com
@ 2010-04-20 17:28 ` mjw at redhat dot com
  2010-04-20 18:22 ` mjw at redhat dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-20 17:28 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-20 12:24 -------
(In reply to comment #5)
> I believe this is because kprobes is doing a *linear* search through kallsyms
> when registering by symbol name, and we have 1400 probes to register.

And it does a linear search at each register_kprobe (and it would even do a
linear search for each one, if we registered them all at once with
register_kprobes since that is just simple loop over register_kprobe).

Resolving them all in one go with kallsyms_on_each_symbol() just before
registering then in kprobe_derived_probe_group::emit_module_init() seems the
quickest way to speed this up.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (5 preceding siblings ...)
  2010-04-20 17:28 ` mjw at redhat dot com
@ 2010-04-20 18:22 ` mjw at redhat dot com
  2010-04-20 18:22 ` mjw at redhat dot com
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-20 18:22 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-20 14:55 -------
(In reply to comment #8)
> (In reply to comment #7)
> > Resolving them all in one go with kallsyms_on_each_symbol() just before
> > registering then in kprobe_derived_probe_group::emit_module_init() seems the
> > quickest way to speed this up.
> 
> Frank correctly pointed out on irc that this isn't a speedup, but jus flips the
> comparison loop inside out. We still need to compare all kallsyms symbols
> against all kprobe symbol names. So we still end up with O(N*M) comparisons. So
> to speed up we really need to be using more clever data structures.

Except that most time seems to go into kallsyms_expand_symbol to uncompress the
kernel symbol table. So flipping the comparison loop inside out to only do that
once might still be a win.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (6 preceding siblings ...)
  2010-04-20 18:22 ` mjw at redhat dot com
@ 2010-04-20 18:22 ` mjw at redhat dot com
  2010-04-21 10:30 ` mjw at redhat dot com
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-20 18:22 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-20 14:38 -------
(In reply to comment #7)
> Resolving them all in one go with kallsyms_on_each_symbol() just before
> registering then in kprobe_derived_probe_group::emit_module_init() seems the
> quickest way to speed this up.

Frank correctly pointed out on irc that this isn't a speedup, but jus flips the
comparison loop inside out. We still need to compare all kallsyms symbols
against all kprobe symbol names. So we still end up with O(N*M) comparisons. So
to speed up we really need to be using more clever data structures.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (7 preceding siblings ...)
  2010-04-20 18:22 ` mjw at redhat dot com
@ 2010-04-21 10:30 ` mjw at redhat dot com
  2010-04-21 19:10 ` mjw at redhat dot com
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-21 10:30 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-20 20:18 -------
(In reply to comment #9)
> Except that most time seems to go into kallsyms_expand_symbol to uncompress the
> kernel symbol table. So flipping the comparison loop inside out to only do that
> once might still be a win.

Yep, seems like this is a pretty nice win:

testcase /home/mark/src/systemtap/testsuite/systemtap.syscall/syscall.exp
completed in 96 seconds

versus

testcase /home/mark/src/systemtap/testsuite/systemtap.syscall/nd_syscall.exp com
pleted in 102 seconds


commit bd6593518895fc613b973500c03a3b844bbe68c5
Author: Mark Wielaard <mjw@redhat.com>
Date:   Tue Apr 20 22:06:15 2010 +0200

    PR11514 Only expand kallsyms once for kprobe_derived_probe_group.
    
    kprobe_register is really expensive when used for multiple symbol
    based probes since it will call kallsyms_expand_symbol over and over
    to uncompress the kernel symbol table. So flip the comparison loop
    inside out by looping once over the kernel symbol table.
    
    * tapsets.cxx (kprobe_derived_probe_group::emit_module_decls): Declare
      kprobe_resolve helper function.
      (kprobe_derived_probe_group::emit_module_init): Call kprobe_resolve
      through kallsyms_on_each_symbol. Never register through symbol name,
      just check that address has been filled in by kprobe_resolve.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (8 preceding siblings ...)
  2010-04-21 10:30 ` mjw at redhat dot com
@ 2010-04-21 19:10 ` mjw at redhat dot com
  2010-04-21 19:50 ` jistone at redhat dot com
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-21 19:10 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-21 10:30 -------
Some follow up patches:

- error/warning reporting was broken, fixed by:

commit 9319b767b63e39e44ad5ece313a98c973dcc25ef
Author: Mark Wielaard <mjw@redhat.com>
Date:   Tue Apr 20 23:03:30 2010 +0200

    Warn if kprobe symbol name not found on non-optional probe.
    
    * tapsets.cxx (kprobe_derived_probe_group::emit_module_init): Call
      _stp_warn when symbol name couldn't be resolved to an address and
      the probe isn't optional like badkprobe.exp expects.

- kallsyms_on_each_symbol isn't available on older kernels, only use it with
newer ones:

commit 03a4ec63de7060914b01320f6e840e0bc2d0beda
Author: Mark Wielaard <mjw@redhat.com>
Date:   Wed Apr 21 11:21:07 2010 +0200

    PR11514 Only use kallsyms_on_each_symbol speedup when available.
    
    * runtime/autoconf-kallsyms-on-each-symbol.c: New check.
    * buildrun.cxx (compile_pass): Add new check to output_autoconf.
    * tapsets.cxx (kprobe_derived_probe_group::emit_module_decls): Only declare
      kprobe_resolve when STAPCONF_KALLSYMS_ON_EACH_SYMBOL defined.
      (kprobe_derived_probe_group::emit_module_init): Check whether to invoke
      kallsyms_on_each_symbol optimization.

- Some micro-optimizations suggested on irc:

commit fc1d2aa28ccee8c722d545b5215ae34f94514b18
Author: Mark Wielaard <mjw@redhat.com>
Date:   Wed Apr 21 11:58:09 2010 +0200

    Add some kallsyms_on_each_symbol micro-optimisations.
    
    These only help when your kprobe group consists of a list of symbols
    that can all be resolved.
    
    * tapsets.cxx (kprobe_derived_probe_group::emit_module_init): Calculate
      how many probe symbols need to be resolved. Pass to kprobe_resolve.
      (kprobe_derived_probe_group::emit_module_decls): In kprobe_resolve
      keep track of how many kprobe symbols haven't resolved yet. Return
      -1 when all done.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (9 preceding siblings ...)
  2010-04-21 19:10 ` mjw at redhat dot com
@ 2010-04-21 19:50 ` jistone at redhat dot com
  2010-04-22 19:46   ` Ananth N Mavinakayanahalli
  2010-04-21 19:50 ` mjw at redhat dot com
  2010-04-22 22:39 ` mjw at redhat dot com
  12 siblings, 1 reply; 17+ messages in thread
From: jistone at redhat dot com @ 2010-04-21 19:50 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jistone at redhat dot com  2010-04-21 19:05 -------
(In reply to comment #10)
> commit bd6593518895fc613b973500c03a3b844bbe68c5
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Tue Apr 20 22:06:15 2010 +0200
> 
>     PR11514 Only expand kallsyms once for kprobe_derived_probe_group.
>     
>     kprobe_register is really expensive when used for multiple symbol
>     based probes since it will call kallsyms_expand_symbol over and over
>     to uncompress the kernel symbol table. So flip the comparison loop
>     inside out by looping once over the kernel symbol table.
>     
>     * tapsets.cxx (kprobe_derived_probe_group::emit_module_decls): Declare
>       kprobe_resolve helper function.
>       (kprobe_derived_probe_group::emit_module_init): Call kprobe_resolve
>       through kallsyms_on_each_symbol. Never register through symbol name,
>       just check that address has been filled in by kprobe_resolve.

There's a gotcha about this that I mentioned on IRC.  On most archs,
kprobe_lookup_name() is a simple wrapper around kallsyms_lookup_name(), so this
new optimization is perfectly valid.

But on PPC64, it's a little more complicated to deal with function descriptors.
 In some cases the returned address is actually a pointer to the real address,
and in other cases it tries to search again with a '.' added to the name.  We
need to deal with this too if we're bypassing kprobe_lookup_name().


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (10 preceding siblings ...)
  2010-04-21 19:50 ` jistone at redhat dot com
@ 2010-04-21 19:50 ` mjw at redhat dot com
  2010-04-22 22:39 ` mjw at redhat dot com
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-21 19:50 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2010-04-21 19:10 -------
Lets reopen this bug then till we got someone testing things against a ppc setup
on a modern kernel.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug testsuite/11514] nd_syscall.exp takes more than 10 times  longer than syscall.exp testcase
  2010-04-21 19:50 ` jistone at redhat dot com
@ 2010-04-22 19:46   ` Ananth N Mavinakayanahalli
  2010-04-22 23:02     ` Mark Wielaard
  0 siblings, 1 reply; 17+ messages in thread
From: Ananth N Mavinakayanahalli @ 2010-04-22 19:46 UTC (permalink / raw)
  To: jistone at redhat dot com; +Cc: systemtap

On Wed, Apr 21, 2010 at 07:05:39PM -0000, jistone at redhat dot com wrote:
> 
> But on PPC64, it's a little more complicated to deal with function descriptors.
>  In some cases the returned address is actually a pointer to the real address,
> and in other cases it tries to search again with a '.' added to the name.  We
> need to deal with this too if we're bypassing kprobe_lookup_name().

Right, on PPC64, the symbol table address entry foo, for a function foo()
is a data address while .foo is the actual text address. In the past,
the System.map would contain both foo and .foo; RHEL5 on, the .foo
entries have been omitted from there.

If possible, I'd suggest that you don't bypass kprobe_lookup_name() for
debuginfoless probes, since it handles more cases like symbol clashes
between modules and such; let the kernel resolve the addresses for us.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug testsuite/11514] nd_syscall.exp takes more than 10 times longer than syscall.exp testcase
  2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
                   ` (11 preceding siblings ...)
  2010-04-21 19:50 ` mjw at redhat dot com
@ 2010-04-22 22:39 ` mjw at redhat dot com
  12 siblings, 0 replies; 17+ messages in thread
From: mjw at redhat dot com @ 2010-04-22 22:39 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From ananth at in dot ibm dot com  2010-04-22 03:46 -------
Subject: Re:  nd_syscall.exp takes more than 10 times
	longer than syscall.exp testcase

On Wed, Apr 21, 2010 at 07:05:39PM -0000, jistone at redhat dot com wrote:
> 
> But on PPC64, it's a little more complicated to deal with function descriptors.
>  In some cases the returned address is actually a pointer to the real address,
> and in other cases it tries to search again with a '.' added to the name.  We
> need to deal with this too if we're bypassing kprobe_lookup_name().

Right, on PPC64, the symbol table address entry foo, for a function foo()
is a data address while .foo is the actual text address. In the past,
the System.map would contain both foo and .foo; RHEL5 on, the .foo
entries have been omitted from there.

If possible, I'd suggest that you don't bypass kprobe_lookup_name() for
debuginfoless probes, since it handles more cases like symbol clashes
between modules and such; let the kernel resolve the addresses for us.

------- Additional Comments From mjw at redhat dot com  2010-04-22 09:45 -------
Lets just disable to optimization on ppc64, which seems to be the only
architecture with different function descriptors.

commit 2cbd952ba29e03641ddfa717a2996b93594a5106
Author: Mark Wielaard <mjw@redhat.com>
Date:   Thu Apr 22 11:42:27 2010 +0200

    PR11514 Disable kallsyms_on_each_symbol optimization on ppc64.
    
    * runtime/autoconf-kallsyms-on-each-symbol.c: #error on #ifdef CONFIG_PPC64.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=11514

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Bug testsuite/11514] nd_syscall.exp takes more than 10 times  longer than syscall.exp testcase
  2010-04-22 19:46   ` Ananth N Mavinakayanahalli
@ 2010-04-22 23:02     ` Mark Wielaard
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Wielaard @ 2010-04-22 23:02 UTC (permalink / raw)
  To: ananth; +Cc: systemtap

On Thu, 2010-04-22 at 09:15 +0530, Ananth N Mavinakayanahalli wrote:
> On Wed, Apr 21, 2010 at 07:05:39PM -0000, jistone at redhat dot com wrote:
> > 
> > But on PPC64, it's a little more complicated to deal with function descriptors.
> >  In some cases the returned address is actually a pointer to the real address,
> > and in other cases it tries to search again with a '.' added to the name.  We
> > need to deal with this too if we're bypassing kprobe_lookup_name().
> 
> Right, on PPC64, the symbol table address entry foo, for a function foo()
> is a data address while .foo is the actual text address. In the past,
> the System.map would contain both foo and .foo; RHEL5 on, the .foo
> entries have been omitted from there.
>
> If possible, I'd suggest that you don't bypass kprobe_lookup_name() for
> debuginfoless probes, since it handles more cases like symbol clashes
> between modules and such; let the kernel resolve the addresses for us.

The optimization uses the kernel to resolve the addresses for us, but
through kallsyms_on_each_symbol() instead of
kprobe_lookup_name/kallsyms_lookup_name(). Since it does indeed seem
that doesn't do anything special for PPC64 function descriptors I have
disabled the optimization on PPC64 for now (commit 2cbd95). But you
might want to double check that kallsyms_on_each_symbol() works as
expected on PPC64 anyway, since it really is a lot more efficient when
having to lookup multiple (possible non-existing) symbol names. The
optimization was added because kallsyms_lookup_name() needs to
uncompress the kernel symbol table on each pass.

If it can be made to work on every architecture we might want to push a
similar optimization upstream to make register_kprobes() do the same. We
are currently not using register_kprobes() because a) it is as slow as
looping over register_kprobe() ourselves, and b) because we allow
registration failures if the probe is "optional". But both could be
added to the upstream interface if needed, and that would then benefit
any consumer that tries to register lots of kprobes by symbol name.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-04-22  9:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-19 18:36 [Bug testsuite/11514] New: nd_syscall.exp takes more than 10 times longer than syscall.exp testcase mjw at redhat dot com
2010-04-19 19:50 ` [Bug testsuite/11514] " fche at redhat dot com
2010-04-19 20:50 ` jistone at redhat dot com
2010-04-20  3:07 ` mjw at redhat dot com
2010-04-20  7:59   ` Jim Keniston
2010-04-20 12:24 ` jistone at redhat dot com
2010-04-20 14:01 ` jistone at redhat dot com
2010-04-20 17:28 ` mjw at redhat dot com
2010-04-20 18:22 ` mjw at redhat dot com
2010-04-20 18:22 ` mjw at redhat dot com
2010-04-21 10:30 ` mjw at redhat dot com
2010-04-21 19:10 ` mjw at redhat dot com
2010-04-21 19:50 ` jistone at redhat dot com
2010-04-22 19:46   ` Ananth N Mavinakayanahalli
2010-04-22 23:02     ` Mark Wielaard
2010-04-21 19:50 ` mjw at redhat dot com
2010-04-22 22:39 ` mjw at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).