public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine
@ 2007-04-24 19:07 wcohen at redhat dot com
  2007-04-24 19:51 ` [Bug kprobes/4420] " dsmith at redhat dot com
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-04-24 19:07 UTC (permalink / raw)
  To: systemtap

Looking at why RHEL4U4 i686 machine (2.6.9-42.0.10.EL i686 kernel) is
dying during the snapshot testing. Some of the kernel error message
looks similar to an earlier closed bug, 2726. However, the tests from
2726 works. Need to narrow down the problem more. Looking through the
systemtap.log for the testsuite the test that is crashing the machine is:

Running
/home/wcohen/stap_testing_200704240830/src/testsuite/systemtap.samples/lket.exp ...

from systemtap.log

Running
/home/wcohen/stap_testing_200704240830/src/testsuite/systemtap.samples/lket.exp ...
Pass 1: parsed user script and 54 library script(s) in 740usr/30sys/805real ms.

Pass 2: analyzed script: 857 probe(s), 310 function(s), 24 embed(s), 132
global(s) in 70000usr/170sys/70548real ms.

Pass 3: translated to C into
"/tmp/stapfJv2vF/stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.c" in
810usr/20sys/832real ms.


Taking a look at the directory:

$ ls -l /tmp/stapfJv2vF/
total 9656
-rw-r--r--  1 wcohen wcohen     795 Apr 24 04:58 Makefile
-rw-r--r--  1 wcohen wcohen 4378845 Apr 24 04:58
stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.c
-rw-r--r--  1 wcohen wcohen 2722463 Apr 24 04:59
stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.ko
-rw-r--r--  1 wcohen wcohen    3067 Apr 24 04:59
stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.mod.c
-rw-r--r--  1 wcohen wcohen   34604 Apr 24 04:59
stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.mod.o
-rw-r--r--  1 wcohen wcohen 2688948 Apr 24 04:59
stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.o

Pretty big module. The following caused the kernel to crash:

 sudo /home/wcohen/stap_testing_200704240830/install/bin/staprun
/tmp/stapfJv2vF/stap_997dc0e24dfee70b8a1d0811d8a016a9_641435.ko


Oops output on console.



slingshot.devel.redhat.com login: Kernel panic - not syncing: kernel/module.c:24
<0>Kernel panic - not syncing: kernel/sched.c:2430: spin_lock(kernel/sched.c:c5
Badness in panic at kernel/panic.c:118                                         
 [<c0123ea0>] panic+0x135/0x142
 [<c011fdbc>] scheduler_tick+0x21d/0x4aa
 [<c012e6a3>] do_timer+0x29/0xb5
 [<c010d080>] timer_interrupt+0x165/0x25a
 [<c0107f00>] handle_IRQ_event+0x25/0x4f
 [<c01088ce>] do_IRQ+0x18a/0x2bf
 =======================
 [<c01e7149>] search_extable+0x1f/0x36
 [<c03198c4>] common_interrupt+0x18/0x20
 [<c01e7149>] search_extable+0x1f/0x36
 [<c0123e5a>] panic+0xef/0x142
 [<c01e7149>] search_extable+0x1f/0x36
 [<c0141690>] search_module_extables+0x6d/0x13b
 [<c01e7149>] search_extable+0x1f/0x36
 [<c0139dcb>] search_exception_tables+0x1f/0x21
 [<c011e1d3>] fixup_exception+0xb/0x20
 [<c011c300>] kprobe_exceptions_notify+0x187/0x19b
 [<c01348a1>] notifier_call_chain+0x17/0x2e
 [<c011d7d9>] do_page_fault+0x0/0x4dc
 [<c011d82b>] do_page_fault+0x52/0x4dc
 [<c0317627>] __cond_resched+0x14/0x3b
 [<c016f41e>] __getblk+0x2b/0x49
 [<e0822903>] ext3_get_inode_loc+0x4f/0x223 [ext3]
 [<c01089f7>] do_IRQ+0x2b3/0x2bf
 [<c03198c4>] common_interrupt+0x18/0x20
 [<c011d7d9>] do_page_fault+0x0/0x4dc
 [<c0319983>] error_code+0x2f/0x38
 [<e0dab333>]<0>Kernel panic - not syncing: kernel/module.c:2114: spin_lock(ker4

-- 
           Summary: systemtap.samples/lket.exp test crashing RHEL4U4 machine
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: critical
          Priority: P2
         Component: kprobes
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: wcohen at redhat dot com
GCC target triplet: i386-linux


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug kprobes/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
@ 2007-04-24 19:51 ` dsmith at redhat dot com
  2007-04-24 20:04 ` wcohen at redhat dot com
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: dsmith at redhat dot com @ 2007-04-24 19:51 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From dsmith at redhat dot com  2007-04-24 20:51 -------
For me, the test doesn't even compile on RHEL4U4 (with the lastest cvs stap) on
2.6.9-42.0.10.EL (x86_64):

# ./stap -k -p4 -v ../src/testsuite/systemtap.samples/lket.stp 
Pass 1: parsed user script and 54 library script(s) in 150usr/10sys/176real ms.
semantic error: unable to find local 'new' near pc 0xffffffff80359f9a
(alternatives: rq prev next): identifier '$new' at
/usr/local/share/systemtap/tapset/scheduler.stp:138:21
semantic error: unable to find local 'new' near pc 0xffffffff80359f9a
(alternatives: rq prev next): identifier '$new' at
/usr/local/share/systemtap/tapset/scheduler.stp:140:21
Pass 2: analyzed script: 882 probe(s), 834 function(s), 24 embed(s), 132
global(s) in 19460usr/100sys/19588real ms.
Pass 2: analysis failed.  Try again with more '-v' (verbose) options.
Keeping temporary directory "/tmp/stapXzpNYA"

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug kprobes/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
  2007-04-24 19:51 ` [Bug kprobes/4420] " dsmith at redhat dot com
@ 2007-04-24 20:04 ` wcohen at redhat dot com
  2007-04-27 20:40 ` [Bug lket/4420] " wcohen at redhat dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-04-24 20:04 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wcohen at redhat dot com  2007-04-24 21:04 -------
The crash occurred on RHEL4U4 i686; the systemtap.samples/lket.stp failed to
compile on the x86_64 machine.  I did a bit more to testing to try to narrow
down the problem. Below is a table (problem are nfs*.stp and syscalls.stp:

aio.stp			okay
hookid_defs.stp		NA
ioscheduler.stp		okay
iosyscall.stp		okay
lket_trace.stp		NA
netdev.stp		okay	
nfsd.stp/nfs_proc.stp/nfs.stp	Crashed (PROBLEM)
pagefault.stp		failed compilation 
process.stp		okay
register_event.stp	NA
rpc.stp			okay
scsi.stp		okay
signal.stp		okay
syscalls.stp		Resets machine/crashes machine (PROBLEM)
timestamp.stp		NA
tskdispatch.stp		okay
utils.stp		NA


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
  2007-04-24 19:51 ` [Bug kprobes/4420] " dsmith at redhat dot com
  2007-04-24 20:04 ` wcohen at redhat dot com
@ 2007-04-27 20:40 ` wcohen at redhat dot com
  2007-04-28 11:58 ` fche at redhat dot com
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-04-27 20:40 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wcohen at redhat dot com  2007-04-27 21:39 -------
When running the test with -DASCII_TRACE it does not seem to crash the machine.
Looking through the code the following idom is seen in much of the code:

probe addevent.nfs.fop.llseek.return
      += _addevent.nfs.fop.llseek.return
{
	update_record()
}

probe _addevent.nfs.fop.llseek.return
      = nfs.fop.llseek.return
{
	log_nfs_return(HOOKID_NFS_FOP_LLSEEK_RETURN,$return)
}

When -DASCII_TRACE is used on the command line, the update_record() becomes an
empty function and the order of the probes at a point becomes unimportant. It
seems like it would make sense to merge those probes together to make sure that
the operations are performed in the correct order.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|kprobes                     |lket


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (2 preceding siblings ...)
  2007-04-27 20:40 ` [Bug lket/4420] " wcohen at redhat dot com
@ 2007-04-28 11:58 ` fche at redhat dot com
  2007-04-30 18:36 ` wcohen at redhat dot com
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: fche at redhat dot com @ 2007-04-28 11:58 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-04-28 12:58 -------
(In reply to comment #3)
> When running the test with -DASCII_TRACE it does not seem to crash the machine.

That is valuable information, and seems to point the figure toward the
runtime, no?

> Looking through the code the following idom is seen in much of the code:
> [...]
> When -DASCII_TRACE is used on the command line, the update_record() becomes an
> empty function and the order of the probes at a point becomes unimportant. It
> seems like it would make sense to merge those probes together to make sure that
> the operations are performed in the correct order.

I don't understand why you believe there is any problem here.
What incorrect order is possible?


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (3 preceding siblings ...)
  2007-04-28 11:58 ` fche at redhat dot com
@ 2007-04-30 18:36 ` wcohen at redhat dot com
  2007-04-30 20:51 ` wcohen at redhat dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-04-30 18:36 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wcohen at redhat dot com  2007-04-30 19:36 -------
It is possible for the problem to be in the runtime. I will comment out the
various calls to the runtime and see if that fixes the problem. I am also trying
to reduce the number of lket probes to see if smaller examples reproduce the
problem.

It looks like the code expects the probes to be run in a particular order for a
probe point. It seems like it would be better to put the update_record() in the
places that they are needed rather than having them as separate probes.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (4 preceding siblings ...)
  2007-04-30 18:36 ` wcohen at redhat dot com
@ 2007-04-30 20:51 ` wcohen at redhat dot com
  2007-05-25 19:43 ` wcohen at redhat dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-04-30 20:51 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wcohen at redhat dot com  2007-04-30 21:51 -------
Noticed that when running the scripts with "-DASCII_TRACE" that the machine did
not crash but there were messages like the following in the trace:

3|1|1177961852|944213|4318|4317|4318|0|4330|4330|3|systemtap/0ERROR: pointer
dereference fault near identifier '$ppos' at
/home/wcohen/stap_snap_200704301701/install/share/systemtap/tapset/vfs.stp:477:17

I commented out the _stp_printf calls in _lket_trace() in lket_trace.stp and
reran. When doing this the machine did not crash. Ended up getting something
error messge like the following:

ERROR: pointer dereference fault near identifier '$ppos' at
/home/wcohen/stap_snap_200704301701/install/share/systemtap/tapset/vfs.stp:477:17
WARNING: Number of errors: 1, skipped probes: 1
Pass 5: run completed in 20usr/1690sys/1753real ms.
Keeping temporary directory "/tmp/stapcIgg9D"


put the code in the following loop with the _stp_printf commented out and things
worked:

 while (true);  do   sudo 
/home/wcohen/stap_snap_200704301701/install/bin/staprun
/tmp/stapNbc2MJ/stap_7c521505a03e921392ee3ab981f37877_161620.ko; done

when the _stp_printf operating in _lket_trace() it crashes. looks like there is
some issue with _stp_printf or the arguments being passed to _stp_printf when
binary format involved. Things worked with "-DASCII_TRACE"



-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (5 preceding siblings ...)
  2007-04-30 20:51 ` wcohen at redhat dot com
@ 2007-05-25 19:43 ` wcohen at redhat dot com
  2007-05-25 19:45 ` wcohen at redhat dot com
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-05-25 19:43 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wcohen at redhat dot com  2007-05-25 19:43 -------
The kernels have CONFIG_DEBUG_SPINLOCK=y set. This machine is running a UP
kernel. The spinlock code is for the most part is a no-op, but the
CONFIG_DEBUG_SPINLOCK code check to make sure that only one attempt is made by
the processor to get the lock. It appears on the lket.stp that the kprobe code
being executed is causing an exception that attempts to grab the lock again. On
the UP kernel the spinlock code causes a panic as a result for this.

The SMP kernel assumes that more than one attempt can be made for the same
spinlock. When running the  systemtap.samples/lket.stp on the same same version
of the kernel, 2.6.9-55, but SMP version the lket script runs without crashing.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (6 preceding siblings ...)
  2007-05-25 19:43 ` wcohen at redhat dot com
@ 2007-05-25 19:45 ` wcohen at redhat dot com
  2007-06-06 20:27 ` hunt at redhat dot com
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: wcohen at redhat dot com @ 2007-05-25 19:45 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From wcohen at redhat dot com  2007-05-25 19:44 -------
Given that all the other machines I run tests on are running SMP kernels that
would explain why this is the only machine that exhibits this problem.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (7 preceding siblings ...)
  2007-05-25 19:45 ` wcohen at redhat dot com
@ 2007-06-06 20:27 ` hunt at redhat dot com
  2007-07-05 10:53 ` prasanna at in dot ibm dot com
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hunt at redhat dot com @ 2007-06-06 20:27 UTC (permalink / raw)
  To: systemtap



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hunt at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (8 preceding siblings ...)
  2007-06-06 20:27 ` hunt at redhat dot com
@ 2007-07-05 10:53 ` prasanna at in dot ibm dot com
  2007-07-25 14:13 ` prasanna at in dot ibm dot com
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: prasanna at in dot ibm dot com @ 2007-07-05 10:53 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From prasanna at in dot ibm dot com  2007-07-05 10:52 -------
From the trace below, it looks like 2 nested pagefaults get generated while
executing the registered pre_handler(). 

This happens consistantly due to registered probe_handler on __switch_to().

First the registered  stap pre_handler()(enter_kprobe_probe()) gets executed and
that generates page_fault at an address 0xd0c73b8b and ends up calling
fixup_exception()->search_exception_tables()->search_extable()->search_module_extables().
This routine search_module_extables() takes up modlist_lock and then another
pagefault happens at an address 0xc01e9a0d due to search_module_extables()
called by fixup_exception() that tries to grab the modlist_lock. Since this is
on uniprocessor with nops as spinlock and SPINLOCK_DEBUG enabled, it panics with
message below.


-------------> 1st trace
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Kernel 2.6.9-prep on an i686
k50wks273993wss.in.ibm.com login: fixup exception c010478d, pid = 2755
modlock d0c73b8b 
fixup exception c010478d, pid = 2755
modlock c01e9a0d
kernel/module.c:2115: spin_lock(kernel/module.c:c0370280) already
 locked by kernel/module.c/2115
modunlock c01e9a0d
Kernel panic - not syncing: kernel/module.c:2126:
spin_unlock(kernel/module.c:c0370280) not locked
 <3>kernel/sched.c:2430: spin_lock(kernel/sched.c:c040b5a0) already
 locked by kernel/sched.c/2685

-----------------------> 2nd trace
                                                                               
                                                               
fixup exception c010478d, pid = 3205
modlock d0c73b8b
fixup exception c010478d, pid = 3205
modlock c01e9a31
 [<c01e9a31>] search_extable+0x1f/0x36
 [<c0141fc6>] search_module_extables+0x23/0x17d
 [<c01e9a31>] search_extable+0x1f/0x36
 [<c013a653>] search_exception_tables+0x1f/0x21
 [<c011e3b3>] fixup_exception+0xb/0x20
 [<c011c481>] kprobe_exceptions_notify+0x1a9/0x1bd
 [<c0135031>] notifier_call_chain+0x17/0x2e
 [<c011d9b5>] do_page_fault+0x0/0x4dc
 [<c011da07>] do_page_fault+0x52/0x4dc
 [<c027b3d1>] ata_output_data+0x60/0x66
 [<c01ed14d>] __delay+0x9/0xa
 [<c02542c4>] serial8250_console_write+0x16c/0x1b2
 [<c0254158>] serial8250_console_write+0x0/0x1b2
 [<c011d9b5>] do_page_fault+0x0/0x4dc
 [<c031e6b7>] error_code+0x2f/0x38

Possible solutions:

                                                                               
                                                               1. Dont allow
probes on __switch_to() only on Uniprocessor machines.
2. Dont allow pagefaults, just recover using setjmp/longjmp() mechanism.
   (posted  earlier on systemtap mailing-lists)                                
                                                                               
                              
Any other possible solutions/suggestions?
                                                                               
                                                               

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (9 preceding siblings ...)
  2007-07-05 10:53 ` prasanna at in dot ibm dot com
@ 2007-07-25 14:13 ` prasanna at in dot ibm dot com
  2007-07-25 18:44 ` joshua dot i dot stone at intel dot com
  2007-08-28 15:26 ` fche at redhat dot com
  12 siblings, 0 replies; 14+ messages in thread
From: prasanna at in dot ibm dot com @ 2007-07-25 14:13 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From prasanna at in dot ibm dot com  2007-07-25 13:01 -------
Possible solutions:
1. Use robust fault handling mechanism for handler which access user address space
   and chances of such handler causing page-faults are high. Robust fualt handling 
   mechanism will recover from such faults.
2. Black list __switch_to() routine and do not allow probes on them.

3. Restrick the probe handler to __switch_to(), so that probes on __switch_to()
   can be still allowed.
4. Use static markers for __switch_to() routines.

It makes more sense to identify routines that get frequently executed in the
hot path and use static markers to collect instrumenation data.

cc'ing Josh

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jkenisto at us dot ibm dot
                   |                            |com, joshua dot i dot stone
                   |                            |at intel dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (10 preceding siblings ...)
  2007-07-25 14:13 ` prasanna at in dot ibm dot com
@ 2007-07-25 18:44 ` joshua dot i dot stone at intel dot com
  2007-08-28 15:26 ` fche at redhat dot com
  12 siblings, 0 replies; 14+ messages in thread
From: joshua dot i dot stone at intel dot com @ 2007-07-25 18:44 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From joshua dot i dot stone at intel dot com  2007-07-25 17:33 -------
(In reply to comment #10)
> Possible solutions:
> 1. Use robust fault handling mechanism for handler which access user address
> space and chances of such handler causing page-faults are high. Robust fualt
> handling mechanism will recover from such faults.

This should always be the case.  It doesn't matter whether the chance of failure
is "high"  -- if there's *any* chance of failure we need to use fault-handling
mechanisms.

> 2. Black list __switch_to() routine and do not allow probes on them.
> 3. Restrick the probe handler to __switch_to(), so that probes on
> __switch_to() can be still allowed.
> 4. Use static markers for __switch_to() routines.

I don't understand why you think __switch_to() itself is to blame.  I haven't
been involved in all of the conversations, but it looks like __switch_to() just
provides a reproducible test-case, not the actual *cause* of the problem.

> It makes more sense to identify routines that get frequently executed in the
> hot path and use static markers to collect instrumenation data.

I agree with this for performance reasons, but it shouldn't make a difference as
a matter of correctness.



-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug lket/4420] systemtap.samples/lket.exp test crashing RHEL4U4 machine
  2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
                   ` (11 preceding siblings ...)
  2007-07-25 18:44 ` joshua dot i dot stone at intel dot com
@ 2007-08-28 15:26 ` fche at redhat dot com
  12 siblings, 0 replies; 14+ messages in thread
From: fche at redhat dot com @ 2007-08-28 15:26 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2007-08-27 20:35 -------
LKET has been retired.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


http://sourceware.org/bugzilla/show_bug.cgi?id=4420

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-08-27 20:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-24 19:07 [Bug kprobes/4420] New: systemtap.samples/lket.exp test crashing RHEL4U4 machine wcohen at redhat dot com
2007-04-24 19:51 ` [Bug kprobes/4420] " dsmith at redhat dot com
2007-04-24 20:04 ` wcohen at redhat dot com
2007-04-27 20:40 ` [Bug lket/4420] " wcohen at redhat dot com
2007-04-28 11:58 ` fche at redhat dot com
2007-04-30 18:36 ` wcohen at redhat dot com
2007-04-30 20:51 ` wcohen at redhat dot com
2007-05-25 19:43 ` wcohen at redhat dot com
2007-05-25 19:45 ` wcohen at redhat dot com
2007-06-06 20:27 ` hunt at redhat dot com
2007-07-05 10:53 ` prasanna at in dot ibm dot com
2007-07-25 14:13 ` prasanna at in dot ibm dot com
2007-07-25 18:44 ` joshua dot i dot stone at intel dot com
2007-08-28 15:26 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).