RE: double fault

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* RE: double fault
@ 2005-11-24  2:37 Stone, Joshua I
  0 siblings, 0 replies; 10+ messages in thread
From: Stone, Joshua I @ 2005-11-24  2:37 UTC (permalink / raw)
  To: Roland McGrath, Martin Hunt; +Cc: systemtap

Roland McGrath wrote:
> The second crash had an esp of 0xf5bd4f98.  If that's a proper stack
> pointer, it's only 104 bytes from the beginning of the stack. 
> Considering that the trap frame itself is 60 bytes, that's fairly
> small for a realistic stack.  It might well be that in fact it's an
> overflowed stack that grew down from below 0xf5bd6000 and overflowed
> by getting below 0xf5bd5034 (which is the end of the struct
> thread_info at the base of the stack). 

I added a check to monitor the stack on the probe entrance, like this:

	unsigned left = (unsigned)CONTEXT->regs & 0xfff;
	printk("stap_debug: %d bytes on the stack");

Once I added that, I started getting only a single output and then a
crash every time.  The value reported is consistantly 3976 bytes - only
120 bytes from the top.  And the eip is now consistantly at that stack
read within do_page_fault as well.

>> Is there a way I can get the double-fault to print a full oops, with
>> a stack trace?
> 
> No, it's a special trap handler that uses its own stack and just has
> the simple printks you've seen.  You'd have to do something like put
> a probe on the line in doublefault_fn where it printk's the esp et
> al, and have that call show_trace on t->esp or something.

A probe here doesn't work.  I tried it, and the system hung up
completely (a triple-fault?).  I think things must be hosed up pretty
bad by the time it gets to doublefault_fn.

And thanks to the infinite wisdom of Linus, it's a pain to get a
debugger in there.  I tried kdb first, but kdb doesn't automatically
catch double-faults.  I put a breakpoint on doublefault_fn, and it
triggered, but kdb just panicked about invalid memory references as it
was trying to take over.  Again, to me this seems to indicate trouble
with the stack.  I couldn't get kgdb to work at all on the RHEL4 kernel
- likely patching issues.

Martin Hunt wrote:
> But I'm not sure its worth pursuing further because it appears to not
> happen in the newer version of kprobes.

Perhaps, or perhaps there's still a landmine in there that is just
better obscured in the newer kprobes.  I would feel much better if there
was a known fix that occurred, instead of the problem magically
disappearing.  I don't think I will spend much more time on this though,
at least until someone runs into the same issue on the new kprobes.

At the very least, judging by the side conversations, we now appear to
have quite a few people looking closely at the fault handling code...

Thanks,

Josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: double fault
@ 2005-11-22  3:46 Stone, Joshua I
  2005-11-22 11:00 ` Roland McGrath
  0 siblings, 1 reply; 10+ messages in thread
From: Stone, Joshua I @ 2005-11-22  3:46 UTC (permalink / raw)
  To: Roland McGrath; +Cc: systemtap

>From: Roland McGrath [mailto:roland@redhat.com] 
>
>The stack overflow notion sounds plausible.  To investigate 
>that angle, one
>thing to try comes to mind off hand.  In each probe that might 
>be hitting,
>stick some %{ ... %} code to do a "stack getting small" check. 
> It can do
>something like:
>
>	unsigned left = (unsigned)regs & 0xfff;
>	if (left < 256) panic("stack getting close");
>
>That might manage to print out a full oops with backtrace 
>details that show
>the cascade of page fault frames or whatever the situation actually is.
>
>
>Thanks,
>Roland
>

I tried the code you gave (using CONTEXT->regs), but I don't understand
how that computes how much stack space is left.  Shouldn't it be
CONTEXT->regs->esp?  And even then, you can see the two esp's from the
register dumps I gave - the first would have triggered your panic, and
the second wouldn't.  Am I missing something?

Anyway, I tried it both ways.  It immediately panics, but there's no
oops info.  It just says "Kernel panic - not syncing".  I added a
dump_stack call, but that all looks innocent.

Is there a way I can get the double-fault to print a full oops, with a
stack trace?

I'm pretty new to kernel-debugging, so sorry if I'm asking simple
questions...

Thanks,

Josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: double fault
  2005-11-22  3:46 Stone, Joshua I
@ 2005-11-22 11:00 ` Roland McGrath
  0 siblings, 0 replies; 10+ messages in thread
From: Roland McGrath @ 2005-11-22 11:00 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

> Shouldn't it be > CONTEXT->regs->esp?

Nope.  For kernel traps the sp and ss is not saved by the i386 hardware, so
that part of the struct pt_regs is not actually there.  However, that
struct itself is the trap frame of the registers that are pushed on the
stack and so it is a stack address near the sp at the time of the fault.

> I tried the code you gave (using CONTEXT->regs), but I don't understand
> how that computes how much stack space is left.  

The stacks are 4k and aligned, so & 0xfff is that sp relative to the base
of the stack.  If sp & 0xfff is very tiny, then the stack is about to
overflow.

> And even then, you can see the two esp's from the register dumps I gave -
> the first would have triggered your panic, and the second wouldn't.  

The second crash had an esp of 0xf5bd4f98.  If that's a proper stack
pointer, it's only 104 bytes from the beginning of the stack.  Considering
that the trap frame itself is 60 bytes, that's fairly small for a realistic
stack.  It might well be that in fact it's an overflowed stack that grew
down from below 0xf5bd6000 and overflowed by getting below 0xf5bd5034
(which is the end of the struct thread_info at the base of the stack).

Of course, it's all just speculation that stack overflow is the issue.

> Is there a way I can get the double-fault to print a full oops, with a
> stack trace?

No, it's a special trap handler that uses its own stack and just has the
simple printks you've seen.  You'd have to do something like put a probe on
the line in doublefault_fn where it printk's the esp et al, and have that
call show_trace on t->esp or something.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* double fault
@ 2005-11-22  1:12 Stone, Joshua I
  2005-11-22  1:25 ` Roland McGrath
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Stone, Joshua I @ 2005-11-22  1:12 UTC (permalink / raw)
  To: systemtap

I am seeing sporadic double-faults when running tests on systemtap.  I
am trying to run systemtap.base/lt.exp, though others fail as well.  It
doesn't always fail, but if I run it four or five times in succession
that's usually enough to trigger the fault.  Below are manual copies of
a couple of the faults dumped to the console:

double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = ffffffff, esp = f4b6500c
eax = ffffffff, ebx = ffffffff, ecx = 0000007b, edx = f4b65018
esi = ffffffff, edi = ffffffff, ebp = 00000000

double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = c011a799, esp = f5bd4f98
eax = f959a380, ebx = f5bd5170, ecx = 0000007b, edx = f4bd505c
esi = 00000000, edi = c011a785, ebp = 00000000

The first dump doesn't tell much, but the edi and eip values in the
second dump are interesting.  'c011a785' is the beginning of
do_page_fault, and the instruction at 'c011a799' is a read from the
stack.  Methinks the stack runneth over?

This is on RHEL4 U2, i686, kernel 2.6.9-22.EL.  I verified this crash on
two different machines with this kernel: an IBM T42 laptop (1.7GHz
Pentium M, 1GB RAM), and a desktop (3.6GHz Pentium 4 HT/EM64T, 2GB RAM).
I couldn't reproduce the problem with the 2.6.9-22.ELsmp kernel.  I also
tried the desktop in x86_64 mode, and could not reproduce the problem
with the UP kernel nor the SMP kernel.

Please let me know if there's any other information I can provide to
help track this down...

Thanks,

Josh Stone

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: double fault
  2005-11-22  1:12 Stone, Joshua I
@ 2005-11-22  1:25 ` Roland McGrath
  2005-11-22  9:29 ` Richard J Moore
  2005-11-23  8:34 ` Martin Hunt
  2 siblings, 0 replies; 10+ messages in thread
From: Roland McGrath @ 2005-11-22  1:25 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

The stack overflow notion sounds plausible.  To investigate that angle, one
thing to try comes to mind off hand.  In each probe that might be hitting,
stick some %{ ... %} code to do a "stack getting small" check.  It can do
something like:

	unsigned left = (unsigned)regs & 0xfff;
	if (left < 256) panic("stack getting close");

That might manage to print out a full oops with backtrace details that show
the cascade of page fault frames or whatever the situation actually is.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: double fault
  2005-11-22  1:12 Stone, Joshua I
  2005-11-22  1:25 ` Roland McGrath
@ 2005-11-22  9:29 ` Richard J Moore
  2005-11-23  8:34 ` Martin Hunt
  2 siblings, 0 replies; 10+ messages in thread
From: Richard J Moore @ 2005-11-22  9:29 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

We need to distinguish between recursive behaviour that's cause stack
depletion and insufficient stack space. If you brows the stack do you see:

1) a great chunk of unused space, or
2) a regular pattern of return addresses

If you follow the stack frames are there any huge jumps - indicating
excessive amounts of local data allocation?

- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072

             "Stone, Joshua                                                
             I"                                                            
             <joshua.i.stone                                            To 
             @intel.com>              <systemtap@sources.redhat.com>       
             Sent by:                                                   cc 
             systemtap-owner                                               
             @sourceware.org                                           bcc 

                                                                   Subject 
             22/11/2005               double fault                         
             01:12                                                         

I am seeing sporadic double-faults when running tests on systemtap.  I
am trying to run systemtap.base/lt.exp, though others fail as well.  It
doesn't always fail, but if I run it four or five times in succession
that's usually enough to trigger the fault.  Below are manual copies of
a couple of the faults dumped to the console:

double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = ffffffff, esp = f4b6500c
eax = ffffffff, ebx = ffffffff, ecx = 0000007b, edx = f4b65018
esi = ffffffff, edi = ffffffff, ebp = 00000000

double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = c011a799, esp = f5bd4f98
eax = f959a380, ebx = f5bd5170, ecx = 0000007b, edx = f4bd505c
esi = 00000000, edi = c011a785, ebp = 00000000

The first dump doesn't tell much, but the edi and eip values in the
second dump are interesting.  'c011a785' is the beginning of
do_page_fault, and the instruction at 'c011a799' is a read from the
stack.  Methinks the stack runneth over?

This is on RHEL4 U2, i686, kernel 2.6.9-22.EL.  I verified this crash on
two different machines with this kernel: an IBM T42 laptop (1.7GHz
Pentium M, 1GB RAM), and a desktop (3.6GHz Pentium 4 HT/EM64T, 2GB RAM).
I couldn't reproduce the problem with the 2.6.9-22.ELsmp kernel.  I also
tried the desktop in x86_64 mode, and could not reproduce the problem
with the UP kernel nor the SMP kernel.

Please let me know if there's any other information I can provide to
help track this down...

Thanks,

Josh Stone

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: double fault
  2005-11-22  1:12 Stone, Joshua I
  2005-11-22  1:25 ` Roland McGrath
  2005-11-22  9:29 ` Richard J Moore
@ 2005-11-23  8:34 ` Martin Hunt
  2005-11-23 17:21   ` Mathieu Desnoyers
  2 siblings, 1 reply; 10+ messages in thread
From: Martin Hunt @ 2005-11-23  8:34 UTC (permalink / raw)
  To: Stone, Joshua I; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 1748 bytes --]

On Mon, 2005-11-21 at 17:12 -0800, Stone, Joshua I wrote: 
> I am seeing sporadic double-faults when running tests on systemtap.  I
> am trying to run systemtap.base/lt.exp, though others fail as well.  It
> doesn't always fail, but if I run it four or five times in succession
> that's usually enough to trigger the fault.  Below are manual copies of
> a couple of the faults dumped to the console:

Sorry I didn't respond sooner. I've been a bit slow the last couple days
due to the flu.

This looks like the same double-fault I've been seeing sporadically on
my laptop running RHEL4 (and nowhere else).  I tried a couple of ways to
track it down but it isn't easy.  I never did get my laptop working with
netdump either.

It appeared to me that the faults were originating in kprobes. In fact
the same OS on the same hardware with the scalability patches does not
have this problem.

I stripped down the generated C file to something very small that still
demonstrated the problem. Basically it has the giant context array and a
sets a single kprobe on sys_open that simply returns.

Changing the kprobe to other functions does not always trigger the bug.

The problem also has something to do with the size of the context array.
Changing NR_CPUS to 128 (which makes the array really huge) was enough
to cause the double fault to happen on all my RHEL machines (including
x86_64) except for ones running under vmware. I changed the code to use
vmalloc (we really want vmalloc_node() but RHEL4 doesn't have it) and
all the crashes stopped on every machine.

Confused yet? I've attached my simple C file that triggers the bug. But
I'm not sure its worth pursuing further because it appears to not happen
in the newer version of kprobes.

Martin



[-- Attachment #2: stap_crash.c --]
[-- Type: text/x-csrc, Size: 2859 bytes --]

#define MAXNESTING 30
#define MAXSTRINGLEN 128
#define STP_STRING_SIZE MAXSTRINGLEN
#include "runtime.h"
#include <linux/string.h>
#include <linux/timer.h>
#include "loc2c-runtime.h" 
typedef char string_t[MAXSTRINGLEN];

struct context {
  atomic_t busy;
  const char *probe_point;
  unsigned actioncount;
  unsigned nesting;
  const char *last_error;
  const char *last_stmt;
  struct pt_regs *regs;
  union {
    struct probe_0_locals {
    } probe_0;
    struct function_my_sys_open_mode_str_locals {
      string_t bs;
      int64_t f;
      string_t __tmp0;
      string_t __tmp1;
      string_t __tmp2;
      string_t __tmp3;
      string_t __tmp4;
      string_t __tmp5;
      string_t __tmp6;
      string_t __tmp7;
      string_t __tmp8;
      string_t __tmp9;
      string_t __tmp10;
      string_t __tmp11;
      string_t __tmp12;
      string_t __tmp13;
      string_t __tmp14;
      string_t __tmp15;
      string_t __tmp16;
      string_t __tmp17;
      string_t __tmp18;
      string_t __tmp19;
      string_t __tmp20;
      string_t __tmp21;
      string_t __tmp22;
      string_t __tmp23;
      string_t __tmp24;
      string_t __tmp25;
      string_t __tmp26;
      string_t __tmp27;
      string_t __tmp28;
      string_t __tmp29;
      string_t __tmp30;
      string_t __tmp31;
      string_t __tmp32;
      string_t __tmp33;
      string_t __tmp34;
      string_t __tmp35;
      string_t __retvalue;
    } function_my_sys_open_mode_str;
  } locals [MAXNESTING];
} contexts [128];


static struct kprobe dwarf_kprobe_0[1]= {
  {.addr= (void *) 0xc016765e}
};

char const * dwarf_kprobe_0_location_names[1] = {
  "kernel.function(\"sys_open@fs/open.c:947\")"
};

static int 
dwarf_kprobe_0_enter (struct kprobe *probe_instance, struct pt_regs *regs) {
  return 0;
}

static int systemtap_module_init (void);
int systemtap_module_init () {
  int rc = 0;
  const char *probe_point = "";
  /* register probe #0, 1 location(s) */
  probe_point = "kernel.function(\"sys_open@fs/open.c:947\")";
  {
    int i;
    printk("in module_init() contexts = %d\n", sizeof(contexts));
    for (i = 0; i < 1; i++) {
    ssleep(5);    
      dwarf_kprobe_0[i].pre_handler = &dwarf_kprobe_0_enter;
      rc = rc || register_kprobe (&(dwarf_kprobe_0[i]));
      if (unlikely (rc)) {
        probe_point = dwarf_kprobe_0_location_names[i];
        break;
      }
    printk("probe registered\n");
    }
    if (unlikely (rc)) while (--i >= 0)
      unregister_kprobe (&(dwarf_kprobe_0[i]));
  }
  
  printk("DONE rc=%d\n", rc);
  ssleep(5);
  return rc;
}

void systemtap_module_exit (void) {
  int i;
  for (i = 0; i < 1; i++)
    unregister_kprobe (&(dwarf_kprobe_0[i]));
}

int probe_start () {
  return systemtap_module_init () ? -1 : 0;
}

void probe_exit () {
  systemtap_module_exit ();
}

MODULE_DESCRIPTION("systemtap probe");
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: double fault
  2005-11-23  8:34 ` Martin Hunt
@ 2005-11-23 17:21   ` Mathieu Desnoyers
  2005-11-23 17:54     ` Martin Hunt
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Desnoyers @ 2005-11-23 17:21 UTC (permalink / raw)
  To: Martin Hunt; +Cc: Stone, Joshua I, systemtap

* Martin Hunt (hunt@redhat.com) wrote:
> Changing the kprobe to other functions does not always trigger the bug.
> 
> The problem also has something to do with the size of the context array.
> Changing NR_CPUS to 128 (which makes the array really huge) was enough
> to cause the double fault to happen on all my RHEL machines (including
> x86_64) except for ones running under vmware. I changed the code to use
> vmalloc (we really want vmalloc_node() but RHEL4 doesn't have it) and
> all the crashes stopped on every machine.
> 

What are the flags used for the memory allocated by vmalloc ?

Did you try : 

- allocating the memory with kmalloc instead of vmalloc ?
- to see if there is a code path that goes from do_page_fault to sys_open ? I
  would be surprised about it, but we never know...

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: double fault
  2005-11-23 17:21   ` Mathieu Desnoyers
@ 2005-11-23 17:54     ` Martin Hunt
  2005-11-23 18:09       ` Mathieu Desnoyers
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Hunt @ 2005-11-23 17:54 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Stone, Joshua I, systemtap

On Wed, 2005-11-23 at 12:21 -0500, Mathieu Desnoyers wrote:
> * Martin Hunt (hunt@redhat.com) wrote:
> > I changed the code to use
> > vmalloc (we really want vmalloc_node() but RHEL4 doesn't have it) and
> > all the crashes stopped on every machine.
> > 
> 
> What are the flags used for the memory allocated by vmalloc ?

From mm/vmalloc.c:
void *vmalloc(unsigned long size)
{
       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL);
}


> Did you try : 
> 
> - allocating the memory with kmalloc instead of vmalloc ?

Why would I do that?  What would I look for?  vmalloc already works.

> - to see if there is a code path that goes from do_page_fault to sys_open ? I
>   would be surprised about it, but we never know...
I think we can assume that files aren't being opened in do_page_fault.
But I checked anyway and I don't see anything like that.

Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: double fault
  2005-11-23 17:54     ` Martin Hunt
@ 2005-11-23 18:09       ` Mathieu Desnoyers
  0 siblings, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2005-11-23 18:09 UTC (permalink / raw)
  To: Martin Hunt; +Cc: Stone, Joshua I, systemtap

* Martin Hunt (hunt@redhat.com) wrote:
> > Did you try : 
> > 
> > - allocating the memory with kmalloc instead of vmalloc ?
> 
> Why would I do that?  What would I look for?  vmalloc already works.
> 

Memory allocated by vmalloc will generate a minor page fault when accessed from
kernel space on behalf of a different process. The page fault handler will
simply update the process'page table in that case. If the page fault handler is
instrumented for a minor fault and the logging code generates a page fault, it
clearly causes a double fault.

But as you only instrument sys_open, this case does not apply.

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-11-24  2:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-24  2:37 double fault Stone, Joshua I
  -- strict thread matches above, loose matches on Subject: below --
2005-11-22  3:46 Stone, Joshua I
2005-11-22 11:00 ` Roland McGrath
2005-11-22  1:12 Stone, Joshua I
2005-11-22  1:25 ` Roland McGrath
2005-11-22  9:29 ` Richard J Moore
2005-11-23  8:34 ` Martin Hunt
2005-11-23 17:21   ` Mathieu Desnoyers
2005-11-23 17:54     ` Martin Hunt
2005-11-23 18:09       ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).