public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* RE: offline elfutils processing committed
@ 2006-11-06 22:41 Stone, Joshua I
  2006-11-07  4:37 ` Roland McGrath
  2006-11-07  6:36 ` Martin Hunt
  0 siblings, 2 replies; 20+ messages in thread
From: Stone, Joshua I @ 2006-11-06 22:41 UTC (permalink / raw)
  To: Martin Hunt, Frank Ch. Eigler; +Cc: systemtap

On Monday, November 06, 2006 1:18 PM, Martin Hunt wrote:
> The point is damage control.  Systemtap allocates too much memory and
> oom killer gets active, the first thing it will kill is staprun and
> that should unload the module (but this seems broken at the moment). 
> So we haven't really hurt the system.

The goal is fine, but I don't think this accomplishes it.  My
understanding is that __alloc_pages will keep calling OOM until it is
able to satisfy the request -- thus the module is blocked waiting for
memory.  The process might end up something like:

stap module: allocate lots of memory
__alloc_pages: Not enough memory -> OOM kill something (staprun)
__alloc_pages: Still not enough memory -> OOM kill other stuff
__alloc_pages: Yay, now we have enough memory!
stap module: got some memory
stap module: Oops, staprun is gone, better exit...

Ouch...


Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: offline elfutils processing committed
@ 2006-11-02 16:42 Stone, Joshua I
  0 siblings, 0 replies; 20+ messages in thread
From: Stone, Joshua I @ 2006-11-02 16:42 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Wednesday, November 01, 2006 2:22 PM, Frank Ch. Eigler wrote:
> Hi -
> 
> On Wed, Nov 01, 2006 at 02:02:23PM -0800, Stone, Joshua I wrote:
> 
>> [...]  The easy-out answer is that stp_random_pm deals with ints,
>> and I needed the randomization to be 64-bit.  [...]
> 
> Yeah, might as well upgrade the runtime's random_pm function.

Upgrade?  Or create a new one as a 64-bit alternative?

>> I also thought that using the kernel's get_random_bytes would yield
>> better randomization [...]
> 
> The problem there is that this kernel facility uses locks of its own,
> which would constitute another probing hazard area.

I think it's ok in this case.  It helps that we know what context we're
calling from -- timers are run in soft-irqs.  The locks via
get_random_bytes use irqsave, so I don't think there's any way for us to
trap here.

>> On a related note, you broke the randomization on jiffies timers when
>> used as ms.  [...]
> 
> Sorry about that.  I probably broke one or two other things by
> mistake -- or out of spite! :-)

Yes, spite would be just like you... ;)

> How about this: rework enter_hrtimer_probe, so that the
> timer expiration computation is done before the
> common_probe_entryfn_prologue.  "But you can't have statements so
> early!" I hear you cry.  Ahem!  Yes, you can - if it's nested within a
> new block.  So:
> 
>   enter_hrtimer_probe (...) {
>     timer->expires = [...];
>     { /* <<<< */
>        ... prologue etc.
> 
>        ... epilogue
>     }
>     return HRTIMER_RESTART;
>  }

Ok, and we'll need a similar thing for enter_timer_probe.  I'm worried
about this introducing races in shutdown though.  If we're trying to
remove the timers, then the timer body shouldn't restart itself.  I
suppose I could check for STAP_SESSION_STARTING/_RUNNING before
reloading the timer -- but we'll also need to change it so that when
init fails the cleanup takes us out of STARTING mode.


Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread
* RE: offline elfutils processing committed
@ 2006-11-01 22:22 Stone, Joshua I
  2006-11-01 22:57 ` Frank Ch. Eigler
  0 siblings, 1 reply; 20+ messages in thread
From: Stone, Joshua I @ 2006-11-01 22:22 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Tuesday, October 31, 2006 6:13 PM, Frank Ch. Eigler wrote:
> Hi -
> 
> A big commit went in just now [...]

cvsps | diffstat says:
 11 files changed, 1058 insertions(+), 1849 deletions(-)

Big indeed!  I hope the testsuite is up to the task... :)

I'll address the XXX notes you added to the hrtimer stuff:


In hrtimer_derived_probe_group::emit_interval:
// XXX: why not use stp_random_pm instead of this?
(re: code that generates randomization of timer interval)

The easy-out answer is that stp_random_pm deals with ints, and I needed
the randomization to be 64-bit.  I'll concede that it's probably better
to add an stp_random_pm_64.  I also thought that using the kernel's
get_random_bytes would yield better randomization than the pseudo-random
generator in stp_random_pm.  I guess the quality of randomization could
be addressed in the runtime function.

On a related note, you broke the randomization on jiffies timers when
used as ms.  The random value needs to be added *before* the conversion
to jiffies.  It's also an optimization to gate the random-call on
whether a randomization value is present.  I'll make these corrections
shortly.


In hrtimer_derived_probe_group::emit_module_decls:
// presume problem with updating ->expires or something else XXX
(re: the restart_or_not variable)

This was a fix for bz #3278.  The problem was that at the end of a
session, the probe prologue skipped skipped over the probe, and the
timer got restarted with the same expiration.  With this change it only
restarts if the expiration was updated.

But, this brings to mind a potential race in the startup, with both
hrtimers and jiffies-timers.  In both cases, the timer only gets
restarted if we run the probe, which only happens if the state is
STAP_SESSION_RUNNING.  If a newly registered timer fires off while we're
still initializing other code (STAP_SESSION_STARTING), the timer won't
be restarted, and it will be useless for the whole session.

I'm actually surprised we haven't had a problem with this yet, as it's a
pretty simple race condition.  I guess there's not many people using
very short timer intervals.  In any case, I'm not sure of an elegant way
to fix it -- suggestions?


Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread
* offline elfutils processing committed
@ 2006-11-01  4:48 Frank Ch. Eigler
  2006-11-01 20:37 ` Roland McGrath
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Ch. Eigler @ 2006-11-01  4:48 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

Hi -

A big commit went in just now, as a holloween prank no doubt.  Some
dude named "fche" finally checked in code to switch to elfutils'
"offline" processing mode, which allows generation of instrumentation
for kernels where the specific list of modules and their addresses
need not be known until run-time.  This involves a somewhat longer
elaboration pass, since all possibly-needed modules' dwarf data are
brought in, not just those that are loaded right now on the
translator's host.

This part needs one more bit of code from staprun (hunt) to pass a
module/section/address table to the module.  A limited mockup is
present in the generated code and should be replaced shortly.
(dwarf_derived_probe_group::emit_module_decls).

The new code also significantly improves the code generated for
scripts involving many probes.  (It also drastically simplifies the
translator's own code for this.  Partly unraveling this mess took lots
of time.)  All this builds on but replaces the earlier "probe groups"
effort.  You may find first-time compile time for complex scripts to
be noticeably shorter.  Since it is compatible with the recent caching
code, second-time compile time is near-zero.

Just to finally commit this piece of work, I just disabled some
previously working but little-used code.  I'll plop them back in
shortly: the benchmarking option, and perfmon/mark based probes.

- FChE

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-11-07 16:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-06 22:41 offline elfutils processing committed Stone, Joshua I
2006-11-07  4:37 ` Roland McGrath
2006-11-07  6:36 ` Martin Hunt
2006-11-07 15:13   ` Frank Ch. Eigler
2006-11-07 15:43     ` Martin Hunt
2006-11-07 16:17       ` Frank Ch. Eigler
2006-11-07 16:39         ` Martin Hunt
2006-11-08  1:26           ` Frank Ch. Eigler
  -- strict thread matches above, loose matches on Subject: below --
2006-11-02 16:42 Stone, Joshua I
2006-11-01 22:22 Stone, Joshua I
2006-11-01 22:57 ` Frank Ch. Eigler
2006-11-01  4:48 Frank Ch. Eigler
2006-11-01 20:37 ` Roland McGrath
2006-11-04  3:29   ` Frank Ch. Eigler
2006-11-04  4:05     ` Roland McGrath
2006-11-04 14:27       ` Frank Ch. Eigler
2006-11-06 20:55     ` Martin Hunt
2006-11-06 21:18       ` Frank Ch. Eigler
2006-11-06 21:52         ` Martin Hunt
2006-11-06 22:15           ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).