public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-08 21:55 Stone, Joshua I
  0 siblings, 0 replies; 12+ messages in thread
From: Stone, Joshua I @ 2005-12-08 21:55 UTC (permalink / raw)
  To: Keshavamurthy, Anil S; +Cc: systemtap

Keshavamurthy, Anil S wrote:
> 	We fixed similar return probe bug in RHEL4-U3 kernel. So, Can
you
> please try the same test on  RHEL4-U3 Beta and get back to the
> mailing list. 

I updated only the kernel-* rpms - that's all that matters here, right?
I'm now on 2.6.9-24.ELsmp...

I get the same result, only with the line number changed: "Kernel BUG at
kprobes:440".  This line 440 is still the same BUG_ON check as I posted
before.

Here is the script I use to trigger it:

  global i
  probe begin { log("starting probe") }
  probe kernel.function("__switch_to").return { ++i }
  probe end { printf("ending probe (%d)\n", i) }

I also trimmed it down to a very small C file that still exhibits the
behavior, included below.

Let me know if there's more info I can provide...

Thanks,

Josh


===================================================================
#include <linux/module.h>
#include <linux/kprobes.h>

#ifdef ARCH_SUPPORTS_KRETPROBES
static int
probe_enter (struct kretprobe_instance *probe_instance, struct pt_regs
*regs) {
  return 0;
}

//
kernel.function("__switch_to@arch/x86_64/kernel/process.c:433").return
static struct kretprobe probe = {
  .kp.addr= (void *) 0xffffffff8010ebea,
  .handler = &probe_enter,
  .maxactive = 0,
};
#endif /* ARCH_SUPPORTS_KRETPROBES */

int init_module(void) {
  #ifdef ARCH_SUPPORTS_KRETPROBES
  return register_kretprobe (&probe);
  #else
  return -1;
  #endif
}

void cleanup_module(void) {
  #ifdef ARCH_SUPPORTS_KRETPROBES
  unregister_kretprobe (&probe);
  #endif
}

MODULE_DESCRIPTION("systemtap probe");
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-08 22:01 Keshavamurthy, Anil S
  0 siblings, 0 replies; 12+ messages in thread
From: Keshavamurthy, Anil S @ 2005-12-08 22:01 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

 
>Keshavamurthy, Anil S wrote:
>> 	We fixed similar return probe bug in RHEL4-U3 kernel. 
>So, Can you
>> please try the same test on  RHEL4-U3 Beta and get back to the
>> mailing list. 
>
>I updated only the kernel-* rpms - that's all that matters 
>here, right?  

Yup, that is correct.

>Let me know if there's more info I can provide...

I think I have all the info need to reproduce this and I will debug and
will get back to you.

Thanks,
Anil

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-08  5:35 Keshavamurthy, Anil S
  0 siblings, 0 replies; 12+ messages in thread
From: Keshavamurthy, Anil S @ 2005-12-08  5:35 UTC (permalink / raw)
  To: Stone, Joshua I, systemtap

Josh,
	We fixed similar return probe bug in RHEL4-U3 kernel. So, Can
you please try the same 
test on  RHEL4-U3 Beta and get back to the mailing list.

Thanks,
Anil Keshavamurthy
>-----Original Message-----
>From: systemtap-owner@sourceware.org 
>[mailto:systemtap-owner@sourceware.org] On Behalf Of Stone, Joshua I
>Sent: Wednesday, December 07, 2005 6:40 PM
>To: systemtap@sources.redhat.com
>Subject: RE: [Bug translator/1276] support more timer varieties
>
>Stone, Joshua I wrote:
>> When I was writing the test script, I did find a couple of 
>probes that
>> failed, but it was unrelated to "current" dereferencing.  I will
>> follow up in another email...
>
>As promised, here they are.  I tried these with very simple 
>probe bodies
>(increment a global number), so I know the crashes have nothing to do
>with "current" dereferencing.
>
>The first one that failed on me was 'kernel.inline("get_current")'.  I
>figured this would be a dangerous one, but I wanted to try it.  This
>matched 2710 locations for me, and probably a lot of them are in
>critical locations.  When I ran this, I got "NMI Watchdog detected
>LOCKUP".  It probably isn't worth trying to make this one work, but if
>we have a blacklist, this should be on it.
>
>The second one that failed was 'kernel.function("__switch_to").return'.
>This one is a problem with kretprobes only, as all of my other 
>probes in
>__switch_to behaved just fine, even in the middle of the function.
>Running this gave "Kernel BUG at kprobes:449" (the full dump 
>is included
>below).  The line mentioned is in trampoline_probe_handler:
>
>    BUG_ON(!orig_ret_address || (orig_ret_address ==
>trampoline_address));
>
>It seems pretty obvious that the actions taken in __switch_to would
>conflict with the way the trampoline works.  I don't know if it's
>possible to make kretprobes work on this function - if not, it 
>should be
>blacklisted.
>
>
>Josh
>
>
>crashdump from kernel.function("__switch_to").return
>===================================================================
>Kernel BUG at kprobes:449
>invalid operand: 0000 [1] SMP
>CPU 1
>Modules linked in: stap_4037(U) nfsd exportfs lockd md5 ipv6
>parport_pc lp parport autofs4 i2c_dev i2c_core smbfs sunrpc ds
>yenta_socket pcmcia_core ipt_REJECT ipt_state ip_conntrack
>iptable_filter ip_tables dm_mirror dm_mod button battery ac
>joydev uhci_hcd ehci_hcd shpchp hw_random ata_piix snd_azx
>snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
>soundcore snd_page_alloc sk98lin(U) floppy ext3 jbd ahci libata
>sd_mod scsi_mod
>Pid: 4006, comm: sshd Tainted: GF     2.6.9-22.ELsmp
>RIP: 0010:[<ffffffff80121809>]
><ffffffff80121809>{trampoline_probe_handler+130}
>RSP: 0018:0000010071d81b28  EFLAGS: 00010002
>RAX: 0000000000000001 RBX: 000001007f9d4600 RCX: ffffffffa0000000
>RDX: 0000000000000001 RSI: 0000010071d81bf8 RDI: 0000000000000000
>RBP: 0000000000000000 R08: 0000010037e14000 R09: 0000000000000003
>R10: 00000000000000ff R11: 0000000000000003 R12: 0000000000000000
>R13: 0000010071d81bf8 R14: 0000000000000000 R15: 0000010002c1a5e0
>FS:  0000002a96a33280(0000) GS:ffffffff804d3180(0000)
>knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 000000383c38f070 CR3: 0000000037e18000 CR4: 00000000000006e0
>Process sshd (pid: 4006, threadinfo 0000010071d80000, task
>0000010076dc3030)
>Stack: ffffffff803d16a0 ffffffff803d16a0 ffffffff80121785
>0000010071d81bf8
>       ffffffff80121786 ffffffff80121751 ffffffff803db140
>0000010071d81bb8
>       0000000000000002 0000010076dc3030
>Call Trace:
>       <ffffffff80121785>{kretprobe_trampoline+0}
>       <ffffffff80121786>{kretprobe_trampoline+1}
>       <ffffffff80121751>{kprobe_handler+559}
>       <ffffffff80121a27>{kprobe_exceptions_notify+40}
>       <ffffffff801432fb>{notifier_call_chain+31}
>       <ffffffff80112460>{do_int3+66}
>       <ffffffff80110aed>{error_exit+0}
>       <ffffffff80121786>{kretprobe_trampoline+1}
>       <ffffffff80121785>{kretprobe_trampoline+0}
>       <ffffffff80222bad>{tty_ldisc_try+60}
>       <ffffffff80303924>{schedule_timeout+101}
>       <ffffffff80222ce0>{tty_ldisc_deref+103}
>       <ffffffff8018902b>{do_select+939}
>       <ffffffff80188bc5>{__pollwait+0}
>       <ffffffff801893aa>{sys_select+820}
>       <ffffffff801908fc>{dnotify_parent+34}
>       <ffffffff80110052>{system_call+126}
>
>Code: 0f 0b 64 b8 31 80 ff ff ff ff c1 01 49 89 ad 80 00 00 00 e8
>RIP <ffffffff80121809>{trampoline_probe_handler+130} RSP
><0000010071d81b28>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-08  2:40 Stone, Joshua I
  0 siblings, 0 replies; 12+ messages in thread
From: Stone, Joshua I @ 2005-12-08  2:40 UTC (permalink / raw)
  To: systemtap

Stone, Joshua I wrote:
> When I was writing the test script, I did find a couple of probes that
> failed, but it was unrelated to "current" dereferencing.  I will
> follow up in another email...

As promised, here they are.  I tried these with very simple probe bodies
(increment a global number), so I know the crashes have nothing to do
with "current" dereferencing.

The first one that failed on me was 'kernel.inline("get_current")'.  I
figured this would be a dangerous one, but I wanted to try it.  This
matched 2710 locations for me, and probably a lot of them are in
critical locations.  When I ran this, I got "NMI Watchdog detected
LOCKUP".  It probably isn't worth trying to make this one work, but if
we have a blacklist, this should be on it.

The second one that failed was 'kernel.function("__switch_to").return'.
This one is a problem with kretprobes only, as all of my other probes in
__switch_to behaved just fine, even in the middle of the function.
Running this gave "Kernel BUG at kprobes:449" (the full dump is included
below).  The line mentioned is in trampoline_probe_handler:

    BUG_ON(!orig_ret_address || (orig_ret_address ==
trampoline_address));

It seems pretty obvious that the actions taken in __switch_to would
conflict with the way the trampoline works.  I don't know if it's
possible to make kretprobes work on this function - if not, it should be
blacklisted.


Josh


crashdump from kernel.function("__switch_to").return
===================================================================
Kernel BUG at kprobes:449
invalid operand: 0000 [1] SMP
CPU 1
Modules linked in: stap_4037(U) nfsd exportfs lockd md5 ipv6
parport_pc lp parport autofs4 i2c_dev i2c_core smbfs sunrpc ds
yenta_socket pcmcia_core ipt_REJECT ipt_state ip_conntrack
iptable_filter ip_tables dm_mirror dm_mod button battery ac
joydev uhci_hcd ehci_hcd shpchp hw_random ata_piix snd_azx
snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
soundcore snd_page_alloc sk98lin(U) floppy ext3 jbd ahci libata
sd_mod scsi_mod
Pid: 4006, comm: sshd Tainted: GF     2.6.9-22.ELsmp
RIP: 0010:[<ffffffff80121809>]
<ffffffff80121809>{trampoline_probe_handler+130}
RSP: 0018:0000010071d81b28  EFLAGS: 00010002
RAX: 0000000000000001 RBX: 000001007f9d4600 RCX: ffffffffa0000000
RDX: 0000000000000001 RSI: 0000010071d81bf8 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000010037e14000 R09: 0000000000000003
R10: 00000000000000ff R11: 0000000000000003 R12: 0000000000000000
R13: 0000010071d81bf8 R14: 0000000000000000 R15: 0000010002c1a5e0
FS:  0000002a96a33280(0000) GS:ffffffff804d3180(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000383c38f070 CR3: 0000000037e18000 CR4: 00000000000006e0
Process sshd (pid: 4006, threadinfo 0000010071d80000, task
0000010076dc3030)
Stack: ffffffff803d16a0 ffffffff803d16a0 ffffffff80121785
0000010071d81bf8
       ffffffff80121786 ffffffff80121751 ffffffff803db140
0000010071d81bb8
       0000000000000002 0000010076dc3030
Call Trace:
       <ffffffff80121785>{kretprobe_trampoline+0}
       <ffffffff80121786>{kretprobe_trampoline+1}
       <ffffffff80121751>{kprobe_handler+559}
       <ffffffff80121a27>{kprobe_exceptions_notify+40}
       <ffffffff801432fb>{notifier_call_chain+31}
       <ffffffff80112460>{do_int3+66}
       <ffffffff80110aed>{error_exit+0}
       <ffffffff80121786>{kretprobe_trampoline+1}
       <ffffffff80121785>{kretprobe_trampoline+0}
       <ffffffff80222bad>{tty_ldisc_try+60}
       <ffffffff80303924>{schedule_timeout+101}
       <ffffffff80222ce0>{tty_ldisc_deref+103}
       <ffffffff8018902b>{do_select+939}
       <ffffffff80188bc5>{__pollwait+0}
       <ffffffff801893aa>{sys_select+820}
       <ffffffff801908fc>{dnotify_parent+34}
       <ffffffff80110052>{system_call+126}

Code: 0f 0b 64 b8 31 80 ff ff ff ff c1 01 49 89 ad 80 00 00 00 e8
RIP <ffffffff80121809>{trampoline_probe_handler+130} RSP
<0000010071d81b28>

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-08  1:53 Stone, Joshua I
  0 siblings, 0 replies; 12+ messages in thread
From: Stone, Joshua I @ 2005-12-08  1:53 UTC (permalink / raw)
  To: Roland McGrath, Frank Ch. Eigler; +Cc: systemtap

Roland McGrath wrote:
> the closest to "dangerous" places would be inside context switching
> and inside interrupt handling. 

In an effort to stress as many of these cases as I could think of, I
wrote the script included below.  I was unable to get this to fail, even
under heavy system load.  I did my testing on RHEL4U2 x86_64, on a P4
w/HT enabled.  My stress test was a full kernel build with 'make -j20',
while also interacting with the desktop (login/out of gdm, load
openoffice & firefox, etc.).

When I was writing the test script, I did find a couple of probes that
failed, but it was unrelated to "current" dereferencing.  I will follow
up in another email...

If anyone feels this is not representative, please let me know how I can
improve the test and/or load.

It may be a good idea to add this sort of test to the pass-5 suite.
Perhaps a new directory is needed - systemtap.stress?

Thanks,

Josh


current.stp
===================================================================
#!/usr/bin/stap -g

global length
function execlen:long () %{ THIS->__retvalue = strlen(current->comm); %}
probe begin { log("systemtap starting probe") }
probe
    timer.profile,
    kernel.function("__switch_to"),
    %( kernel_vr == "2.6.9-22.ELsmp" %?
       %( arch == "x86_64" %?
          /* the lines before, at, and after the update of pcurrent */
 
kernel.statement("__switch_to@arch/x86_64/kernel/process.c:508"),
 
kernel.statement("__switch_to@arch/x86_64/kernel/process.c:509"),
 
kernel.statement("__switch_to@arch/x86_64/kernel/process.c:510"),
       %)
    %)
    kernel.function("*@kernel/sched.c"),
    kernel.function("*@kernel/sched.c").return,
    module("*").function("*interrupt*"),
    module("*").function("*interrupt*").return
{ length <<< execlen() }
probe end {
    log("systemtap ending probe")
    printf("count=%d\n", @count(length))
    printf("sum=%d\n", @sum(length))
    printf("min=%d\n", @min(length))
    printf("max=%d\n", @max(length))
    printf("avg=%d\n", @avg(length))

    /*
     * Check that the min & max lengths look reasonable.  If any string
was
     * either empty or too big, then the current pointer probably wasn't
     * valid, even though it dereferenced without crashing.
     */
    if (@min(length) > 0) {
        log("systemtap test success")
    } else {
        log("unexpected minimum length")
        log("systemtap test failure")
    }
    if (@max(length) < %( kernel_v >= "2.6.11" %? TASK_COMM_LEN %: 16
%)) {
        log("systemtap test success")
    } else {
        log("unexpected maximum length")
        log("systemtap test failure")
    }
}

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-06  1:40 Stone, Joshua I
  2005-12-06 15:07 ` Frank Ch. Eigler
  0 siblings, 1 reply; 12+ messages in thread
From: Stone, Joshua I @ 2005-12-06  1:40 UTC (permalink / raw)
  To: Roland McGrath, Frank Ch. Eigler; +Cc: systemtap

Roland McGrath wrote:
> What this really means is that "current" always works, it's just not
> always useful to think about that task.  i.e., in an interrupt
> handler, the interrupt has nothing necessarily to do with the task
> that was current at the time of the interrupt.  I'm fairly sure that
> "current" never yields garbage, or even a task that isn't exactly the
> one that was on the CPU running process-mode code last.

That is my understanding as well... the pointer is always valid, it just
doesn't pertain to the interrupt itself.


Frank Ch. Eigler wrote:
> If it doesn't scare you off, then let's have a volunteer try disarming
> some of those in_interrupt() conditionals (maybe replacing them with a
> quick NULL or other simple pointer validity check), and write a few
> stress tests (probes in uncomfortable spots for "current" usage).

I'll volunteer for this (though others are welcome as well).  Have any
suggested probe points that are particularly uncomfortable?


Josh

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [Bug translator/1276] support more timer varieties
@ 2005-12-02  2:54 Stone, Joshua I
  2005-12-06  1:03 ` Frank Ch. Eigler
  0 siblings, 1 reply; 12+ messages in thread
From: Stone, Joshua I @ 2005-12-02  2:54 UTC (permalink / raw)
  To: Frank Ch. Eigler, systemtap

Frank Ch. Eigler wrote:
> Hi -
> 
>> [...]  timer.profile is always in interrupt, so current->tgid is not
>> accessed.  I tried this without the in_interrupt check, and the pids
>> I got looked fine.  [...]  The reason you might NOT want to read
>> from current in an interrupt is that it may be misleading to the
>> user if they're not expecting this.
> 
> Is that the only reason?  I was under the impression that it is also
> possible for utterly fictional "current" values to be left in place,
> which could cause problems if dereferenced.

AFAIK, current is always valid.  I know in the Vtune driver we
frequently use current during PMU interrupts, and there's never been an
issue.  Not to say that Vtune is difinitive - I scanned the kernel
sources as well, and didn't see anything making checks before accessing
current.  I also didn't find anything through Google.  So unless someone
knows otherwise, or has a reference to cast doubt on this, I think it's
safe.

The "let's not mislead the user" issue may be minor enough that we just
say who cares...

> Or a "I believe `current' is safe to dereference here." flag.

If we determine that there are specific unsafe instances of current,
then yes, this would be nice.

> I am rather embarrassed to suggest this in the interim:
> 
> #      if (unlikely(in_interrupt() &&
> #                   strcmp(CONTEXT->probe_point, "timer.profile")))

Hmm... I think we can at least do:
#      if (unlikely(in_interrupt() &&
#                   (CONTEXT->probe_point != "timer.profile")))

Since they are both literal strings, string-pooling should make them
identical pointers.  We could whitelist probes this way, but...  yuck...

I hope we can get a better answer to the validity of "current".

Josh

^ permalink raw reply	[flat|nested] 12+ messages in thread
[parent not found: <CBDB88BFD06F7F408399DBCF8776B3DC05C039A8@scsmsx403.amr.corp.intel.com>]

end of thread, other threads:[~2005-12-08 22:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-08 21:55 [Bug translator/1276] support more timer varieties Stone, Joshua I
  -- strict thread matches above, loose matches on Subject: below --
2005-12-08 22:01 Keshavamurthy, Anil S
2005-12-08  5:35 Keshavamurthy, Anil S
2005-12-08  2:40 Stone, Joshua I
2005-12-08  1:53 Stone, Joshua I
2005-12-06  1:40 Stone, Joshua I
2005-12-06 15:07 ` Frank Ch. Eigler
2005-12-06 18:47   ` Roland McGrath
2005-12-02  2:54 Stone, Joshua I
2005-12-06  1:03 ` Frank Ch. Eigler
2005-12-06  1:11   ` Roland McGrath
     [not found] <CBDB88BFD06F7F408399DBCF8776B3DC05C039A8@scsmsx403.amr.corp.intel.com>
2005-12-01 21:38 ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).