public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* PThread profiling
@ 2009-02-23 20:18 Daniel Tralamazza
  2009-02-23 23:29 ` Josh Stone
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Tralamazza @ 2009-02-23 20:18 UTC (permalink / raw)
  To: systemtap

Hi,

  I am doing userland synchronization primitives analysis as part of
my research. For the past 2 months or so I've been bugging people over
IRC with user space questions and bugs (sorry mjw & fche) I guess it's
time to show something.
  Just for a bit of context: I first tried to use dynamic probes,
e.g.: 'process("/lib64/libpthread.so.0").function("__pthread_mutex_lock")
{}'. I even did a tapset for most pthread functions inside NPTL
('probe pthread.create'). It worked fine, but the overhead was causing
measurements errors (too slow == higher contention probability). It
was clear that I had to use static markers, all I had to do was patch
glibc/nptl and voila.
  Right now I have a simple systemtap provider and a small glibc patch
(both WIP). I will continue to update this work as part of my
research, at the same time I making it available to anyone interested.

For the lazy people (like me) I have built glibc rpms for fedora 10
x86_64, you can find them here
http://daniel.tralamazza.com/pub/rpms.tar.gz
For the rest of you (suicidal maniacs) wanting to compile your own
glibc, I've put together a glibc.spec + patches:
http://daniel.tralamazza.com/pub/glibc.spec
http://daniel.tralamazza.com/pub/glibc-usdt.patch
http://daniel.tralamazza.com/pub/glibc-usdt-20081113T2206.tar.bz2
http://daniel.tralamazza.com/pub/pthread_probe.d  (you need this if
you want to regenerate pthread_probe.h)

And there is even an example! Because everyone always shows lock
contention I chose something different.
The script can be found here:
http://daniel.tralamazza.com/pub/lock_topshared.stp
It shows the top 10 most shared locks, i.e.: locks which are accessed
by different threads and the sum of all acquisitions (per lock).

cheers,

-- 
Daniel Tralamazza
EPFL IC IIF DSLAB
INN-331

ps: The current patch doesn't contain the pthread_cond_* probes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PThread profiling
  2009-02-23 20:18 PThread profiling Daniel Tralamazza
@ 2009-02-23 23:29 ` Josh Stone
  2009-02-25  1:55   ` Daniel Tralamazza
  2009-02-25  7:22   ` Frank Ch. Eigler
  0 siblings, 2 replies; 7+ messages in thread
From: Josh Stone @ 2009-02-23 23:29 UTC (permalink / raw)
  To: Daniel Tralamazza; +Cc: systemtap

Daniel Tralamazza wrote:
> I even did a tapset for most pthread functions inside NPTL
> ('probe pthread.create').

I think such a tapset would have general interest, even with the slower 
uprobe version.  You could define the tapset to work with both the base 
glibc and your patched version:

probe pthread.create = process(...).mark(...)!,
                        process(...).function(...)
{ ... }

> It worked fine, but the overhead was causing measurements errors
> (too slow == higher contention probability). It was clear that I
> had to use static markers, all I had to do was patch glibc/nptl
> and voila.

Can you share performance numbers on this?  I'd like to see the 
comparison of unprobed, function uprobes, and static markers...

Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PThread profiling
  2009-02-23 23:29 ` Josh Stone
@ 2009-02-25  1:55   ` Daniel Tralamazza
  2009-02-25  7:22   ` Frank Ch. Eigler
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Tralamazza @ 2009-02-25  1:55 UTC (permalink / raw)
  To: Josh Stone; +Cc: systemtap

On Mon, Feb 23, 2009 at 9:17 PM, Josh Stone <jistone@redhat.com> wrote:
> Daniel Tralamazza wrote:
>>
>> I even did a tapset for most pthread functions inside NPTL
>> ('probe pthread.create').
>
> I think such a tapset would have general interest, even with the slower
> uprobe version.  You could define the tapset to work with both the base
> glibc and your patched version:
>
> probe pthread.create = process(...).mark(...)!,
>                       process(...).function(...)
> { ... }

I uploaded a draft of the unified tapset (uprobes + static)
http://daniel.tralamazza.com/pub/pthread_tapset.stp
Thanks for the tips ;)

>
>> It worked fine, but the overhead was causing measurements errors
>> (too slow == higher contention probability). It was clear that I
>> had to use static markers, all I had to do was patch glibc/nptl
>> and voila.
>
> Can you share performance numbers on this?  I'd like to see the comparison
> of unprobed, function uprobes, and static markers...

I ran 2 web browsers benchmarks on firefox
Spider: http://www2.webkit.org/perf/sunspider-0.9/sunspider-driver.html
V8: http://v8.googlecode.com/svn/data/benchmarks/v3/run.html

                  baseline (no stap)            uprobes
     static markers
spider         2080.4ms +/- 13.3%        couldn't complete
20001.3ms +/- 9.2%
V8              312                                55.2
            75.8

The results are particularly bad on all string functions in spider (up
to 40x overhead).
I am trying to get a proper machine to run these experiments and oprofile it.

>
> Josh
>
>


-- 
Daniel Tralamazza
EPFL IC IIF DSLAB
INN-331

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PThread profiling
  2009-02-23 23:29 ` Josh Stone
  2009-02-25  1:55   ` Daniel Tralamazza
@ 2009-02-25  7:22   ` Frank Ch. Eigler
  2009-02-25  8:01     ` Josh Stone
  2009-02-25 12:52     ` Daniel Tralamazza
  1 sibling, 2 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2009-02-25  7:22 UTC (permalink / raw)
  To: Josh Stone; +Cc: Daniel Tralamazza, systemtap

Josh Stone <jistone@redhat.com> writes:

>> It worked fine, but the overhead was causing measurements errors
>> (too slow == higher contention probability). It was clear that I
>> had to use static markers, all I had to do was patch glibc/nptl
>> and voila.
>
> Can you share performance numbers on this?  I'd like to see the
> comparison of unprobed, function uprobes, and static markers...

Since user-space static markers are currently implemented in terms of
uprobes, it should not assist performance.  We may devise a different
method to jump to the kernel-side handler (e.g., some creatively
misused system call/signal that we can catch via utrace/kprobes), at
which point it could get much faster.

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PThread profiling
  2009-02-25  7:22   ` Frank Ch. Eigler
@ 2009-02-25  8:01     ` Josh Stone
  2009-02-25 12:52     ` Daniel Tralamazza
  1 sibling, 0 replies; 7+ messages in thread
From: Josh Stone @ 2009-02-25  8:01 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Daniel Tralamazza, systemtap

Frank Ch. Eigler wrote:
> Since user-space static markers are currently implemented in terms of
> uprobes, it should not assist performance.

Shouldn't it at least be able to skip the single-step?

Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PThread profiling
  2009-02-25  7:22   ` Frank Ch. Eigler
  2009-02-25  8:01     ` Josh Stone
@ 2009-02-25 12:52     ` Daniel Tralamazza
  2009-02-25 21:31       ` Mark Wielaard
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Tralamazza @ 2009-02-25 12:52 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Josh Stone, systemtap

I was surprised with the low performance of static markers so did some
oprofiling. Good news (as expected), there is no difference between
the original glibc and the one with markers added. I'm running fedora
10 x86_64 under vmware fusion (2 vcpus & 768RAM) on a macbook pro
2.4GHz. Unfortunately vmware doesn't export perf counters so I used
oprofile in timer interrupt mode.

Here are some results http://pastebin.com/ffcf4cc3 for static markers
(sunspider benchmark on firefox). Over 1/3 of all samples comes from
traps (traps_64.c:71), I didn't know that even static markers use
int3.
Sunspider results for static markers http://tinyurl.com/bsjrof
And baseline http://tinyurl.com/ct4pwm


--
Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PThread profiling
  2009-02-25 12:52     ` Daniel Tralamazza
@ 2009-02-25 21:31       ` Mark Wielaard
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Wielaard @ 2009-02-25 21:31 UTC (permalink / raw)
  To: Daniel Tralamazza; +Cc: Frank Ch. Eigler, Josh Stone, systemtap

Hi Daniel,

On Wed, 2009-02-25 at 12:08 +0100, Daniel Tralamazza wrote:
> I was surprised with the low performance of static markers so did some
> oprofiling. Good news (as expected), there is no difference between
> the original glibc and the one with markers added.

I assume when you say "original glibc" you mean, run with dynamic
markers added instead of static markers?

> Here are some results http://pastebin.com/ffcf4cc3 for static markers
> (sunspider benchmark on firefox). Over 1/3 of all samples comes from
> traps (traps_64.c:71), I didn't know that even static markers use
> int3.
> Sunspider results for static markers http://tinyurl.com/bsjrof
> And baseline http://tinyurl.com/ct4pwm

It seems the slowdown is terrible on the string tests. Any idea what
they do that they seem to hit the pthread probes so much? Or do they hit
a probe with a really high overhead?

Might it be a somewhat unfortunately placed marker that is really hit a
lot of times in this particular case? Maybe you can try running with
stap -t to get some rough statistic on how many times and with how much
overhead each probe is being hit?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-02-25 11:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-23 20:18 PThread profiling Daniel Tralamazza
2009-02-23 23:29 ` Josh Stone
2009-02-25  1:55   ` Daniel Tralamazza
2009-02-25  7:22   ` Frank Ch. Eigler
2009-02-25  8:01     ` Josh Stone
2009-02-25 12:52     ` Daniel Tralamazza
2009-02-25 21:31       ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).