* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
@ 2008-03-18 22:27 ` fche at redhat dot com
2008-03-19 1:56 ` mhiramat at redhat dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-03-18 22:27 UTC (permalink / raw)
To: systemtap
------- Additional Comments From fche at redhat dot com 2008-03-18 22:27 -------
Note that according to the oops message, the systemtap probe module
has already been unloaded. This int3 in schedule_tick appears to have
been left behind. Could this be associated with the batch-unregister
kprobe code - if that's in that kernel?
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
2008-03-18 22:27 ` [Bug kprobes/5963] " fche at redhat dot com
@ 2008-03-19 1:56 ` mhiramat at redhat dot com
2008-03-19 2:10 ` mhiramat at redhat dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: mhiramat at redhat dot com @ 2008-03-19 1:56 UTC (permalink / raw)
To: systemtap
------- Additional Comments From mhiramat at redhat dot com 2008-03-19 01:56 -------
(In reply to comment #1)
> Could this be associated with the batch-unregister
> kprobe code - if that's in that kernel?
I think it is not in that kernel yet... could you check it in /proc/kallsyms?
$ grep register_kprobes /proc/kallsyms
And have you ever gotten this panic on bare hardware(not on vmware)?
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
2008-03-18 22:27 ` [Bug kprobes/5963] " fche at redhat dot com
2008-03-19 1:56 ` mhiramat at redhat dot com
@ 2008-03-19 2:10 ` mhiramat at redhat dot com
2008-03-19 5:56 ` ananth at in dot ibm dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: mhiramat at redhat dot com @ 2008-03-19 2:10 UTC (permalink / raw)
To: systemtap
------- Additional Comments From mhiramat at redhat dot com 2008-03-19 02:09 -------
(In reply to comment #2)
> (In reply to comment #1)
> > Could this be associated with the batch-unregister
> > kprobe code - if that's in that kernel?
>
> I think it is not in that kernel yet... could you check it in /proc/kallsyms?
I checked that batch-unregister patch were not in 2.6.25-0.121.rc5.git4.fc9.
(from src.rpm)
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (2 preceding siblings ...)
2008-03-19 2:10 ` mhiramat at redhat dot com
@ 2008-03-19 5:56 ` ananth at in dot ibm dot com
2008-03-19 9:33 ` ananth at in dot ibm dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: ananth at in dot ibm dot com @ 2008-03-19 5:56 UTC (permalink / raw)
To: systemtap
------- Additional Comments From ananth at in dot ibm dot com 2008-03-19 05:55 -------
Will,
Was your test on an smp system?
I've seen this crash once on my F9-Alpha... its a uni-processor system. The test
says its applicable only for SMPs. This test crashes when run during make
installcheck. But, stap -vvv -g pmap_agg_overflow.stp aborts the run with a warning.
I looked at the stap script... it probes scheduler_tick. To just verify I tried
a kprobe module that had a probe on scheduler_tick and it worked without problems.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (3 preceding siblings ...)
2008-03-19 5:56 ` ananth at in dot ibm dot com
@ 2008-03-19 9:33 ` ananth at in dot ibm dot com
2008-03-19 12:56 ` wcohen at redhat dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: ananth at in dot ibm dot com @ 2008-03-19 9:33 UTC (permalink / raw)
To: systemtap
------- Additional Comments From ananth at in dot ibm dot com 2008-03-19 09:32 -------
From the objdump, it looks like the OOPS is at hlist_for_each_entry_rcu() in
get_kprobe(). Code surrounding this hasn't changed since July 2005!
However, a custom built 2.6.25-rc6 on the F9-Alpha works fine. Strange!
[ananth@... linux-2.6.25-rc6]$ uname -a
Linux ....in.ibm.com 2.6.25-rc6-lean #1 Wed Mar 19 13:53:38 IST 2008 i686 i686
i386 GNU/Linux
[ananth@... linux-2.6.25-rc6]$ stap -V
SystemTap translator/driver (version 0.6.2/0.133 built 2008-03-18)
Copyright (C) 2005-2008 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
Objdump of the same portion on the working kernel is:
00000000 <get_kprobe>:
0: 89 c1 mov %eax,%ecx
2: 69 c0 01 00 37 9e imul $0x9e370001,%eax,%eax
8: 53 push %ebx
9: 83 ec 04 sub $0x4,%esp
c: 89 e3 mov %esp,%ebx
e: c1 e8 1a shr $0x1a,%eax
11: 8b 04 85 08 00 00 00 mov 0x8(,%eax,4),%eax
18: 89 04 24 mov %eax,(%esp)
1b: eb 03 jmp 20 <get_kprobe+0x20>
1d: 89 14 24 mov %edx,(%esp)
20: 8b 03 mov (%ebx),%eax
22: 85 c0 test %eax,%eax
24: 74 0e je 34 <get_kprobe+0x34>
26: 8b 04 24 mov (%esp),%eax
29: 8b 10 mov (%eax),%edx
2b: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
2f: 39 48 18 cmp %ecx,0x18(%eax)
32: 75 e9 jne 1d <get_kprobe+0x1d>
34: 5a pop %edx
35: 5b pop %ebx
36: c3 ret
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (4 preceding siblings ...)
2008-03-19 9:33 ` ananth at in dot ibm dot com
@ 2008-03-19 12:56 ` wcohen at redhat dot com
2008-03-19 14:22 ` wcohen at redhat dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: wcohen at redhat dot com @ 2008-03-19 12:56 UTC (permalink / raw)
To: systemtap
------- Additional Comments From wcohen at redhat dot com 2008-03-19 12:55 -------
I have only tested this on a vmware machine running F-9. I have not tested this
on bare hardware. This is a uniprocessor machine. Maybe this is something that
vmware is not handling properly. I have a F-8 image on the same vmware machine.
I will check to whether the test runs correctly there.
The test correct prints the message that the test is only applies to
uniprocessor machines and then crashes. The test appears to be exiting. Could
there be a race where the handler for the probe is removed before. The problem
is elsewhere just manifested in that particular place because the data
structures are incorrect?
Are there other changes in the kprobes between the F-8 (2.6.24.3-34.fc8)and F-9
(2.6.25-0.121.rc5.git4.fc9) kernels)?
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (5 preceding siblings ...)
2008-03-19 12:56 ` wcohen at redhat dot com
@ 2008-03-19 14:22 ` wcohen at redhat dot com
2008-03-19 14:30 ` ananth at in dot ibm dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: wcohen at redhat dot com @ 2008-03-19 14:22 UTC (permalink / raw)
To: systemtap
------- Additional Comments From wcohen at redhat dot com 2008-03-19 14:21 -------
Trying the test on F-8 vmware machine cause it to crash in the same manner. I am
beginning to think this is an issue with vmware. I have an F-8 i686 machine
running the kernel and checkout of systemtap. It is a dual processor machine.
I set it to be uniprocessor and ran the test repeatedly and it ran fine.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (6 preceding siblings ...)
2008-03-19 14:22 ` wcohen at redhat dot com
@ 2008-03-19 14:30 ` ananth at in dot ibm dot com
2008-04-17 17:05 ` fche at redhat dot com
2008-07-09 17:13 ` fche at redhat dot com
9 siblings, 0 replies; 11+ messages in thread
From: ananth at in dot ibm dot com @ 2008-03-19 14:30 UTC (permalink / raw)
To: systemtap
------- Additional Comments From ananth at in dot ibm dot com 2008-03-19 14:30 -------
That an upstream kernel runs fine leads me to suspect the Fedora kernel. That
said, there have been updates to kprobes between 2.6.24 and 2.6.25-rc6 upstream
kernels, but none of them are in the general vicinity that'd cause such a crash.
(They are mostly related to the kretprobe entry handler, Masami's kretprobe
bugfix, and my CONFIG_KRETPROBE addition).
I don't think this is a handler issue. The problem seems to be at get_kprobe()
time, which lends suspicion of kp.hlist being corrupt. kp.hlist is the first
element of the structure, and *maybe* some pointer/data manipulation is leading
to this getting corrupt.
Also, probing scheduler_tick() directly using a plain C kprobe module on a
kernel that exhibits the crash, works fine. Even stap invoked on just the script
also puts out a warning and doesn't crash the system.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (7 preceding siblings ...)
2008-03-19 14:30 ` ananth at in dot ibm dot com
@ 2008-04-17 17:05 ` fche at redhat dot com
2008-07-09 17:13 ` fche at redhat dot com
9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-04-17 17:05 UTC (permalink / raw)
To: systemtap
------- Additional Comments From fche at redhat dot com 2008-04-17 14:00 -------
I've seen similar crashes lately on my KVM virtual machines.
I wonder if this is a systemic problem with these emulators.
Even if so, it is probably worth working around it in systemtap
if possible, by perhaps adding a bunch of redundant cache-flush,
schedule() type calls into the shutdown sequence.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug kprobes/5963] testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9
2008-03-18 18:01 [Bug kprobes/5963] New: testsuite/systemtap.maps/pmap_agg_overflow.stp crashes on 2.6.25-0.121.rc5.git4.fc9 wcohen at redhat dot com
` (8 preceding siblings ...)
2008-04-17 17:05 ` fche at redhat dot com
@ 2008-07-09 17:13 ` fche at redhat dot com
9 siblings, 0 replies; 11+ messages in thread
From: fche at redhat dot com @ 2008-07-09 17:13 UTC (permalink / raw)
To: systemtap
------- Additional Comments From fche at redhat dot com 2008-07-09 17:12 -------
I found a race condition in the runtime w.r.t. probes that
exit() during their begin probes. Testing a patch.
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|systemtap at sources dot |fche at redhat dot com
|redhat dot com |
Status|NEW |ASSIGNED
http://sourceware.org/bugzilla/show_bug.cgi?id=5963
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 11+ messages in thread