* [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system.
@ 2007-01-24 9:50 srinivasa at in dot ibm dot com
2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: srinivasa at in dot ibm dot com @ 2007-01-24 9:50 UTC (permalink / raw)
To: systemtap
My environment is systemtap(systemtap-20070120.tar.bz2),
elfutils(elfutils-0.124.tar.gz), kernel(2.6.18-1.2961.el5), p570 lpared system.
I was compiling latest systemtap source on rhel5(2.6.18-1.2961.el5) kernel and
system dropped to xmon.
Screen looks loke this
==================================
FAIL: probefunc:kernel.statement(0xc00000000005be34) compilation
FAIL: probefunc:kernel.function("scheduler_tick") compilation
FAIL: probefunc:kernel.inline("context_switch") compilation
Running
/home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/simple.exp
...
Running
/home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/timeofday.exp
...
FAIL: timeofday test compilation
Running
/home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/timers.exp
...
Running
/home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/tri.exp
...
Running
/home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.maps/absentstats.exp
...
/////System crashed/////
=========================================
What xmon shows
=============================
Unable to handle kernel paging request for data at address 0x420000000007f
5:mon> e
cpu 0x5: Vector: 300 (Data Access) at [c000000028a6b310]
pc: c000000000349f40: ._spin_lock+0x20/0x88
lr: c0000000000d6640: .__cache_alloc_node+0x4c/0x174
sp: c000000028a6b590
msr: 8000000000001032
dar: 420000000007f
dsisr: 40000000
current = 0xc00000002ad74430
paca = 0xc000000000464d00
pid = 15804, comm = staprun
5:mon> t
[c000000028a6b610] c0000000000d6640 .__cache_alloc_node+0x4c/0x174
[c000000028a6b6b0] c0000000000d6cc8 .kmem_cache_alloc_node+0x104/0x12c
[c000000028a6b750] d000000000aa6f18 ._stp_map_init+0xa0/0x150
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6b800] d000000000aa70d4 ._stp_pmap_new+0x10c/0x1f0
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6b8c0] d000000000aa7328 ._stp_pmap_new_ix+0x170/0x28c
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6b970] d000000000aa757c .systemtap_module_init+0x138/0x254
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6ba20] d000000000aa76a8 .probe_start+0x10/0x2c
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6baa0] d000000000aa7734 ._stp_handle_start+0x70/0x10c
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6bbb0] d000000000aa79b8 ._stp_proc_write_cmd+0x1e8/0x9b0
[stap_a0def33066a20a930648d1bcbf25f718_896]
[c000000028a6bcf0] c0000000000e026c .vfs_write+0x118/0x200
[c000000028a6bd90] c0000000000e09dc .sys_write+0x4c/0x8c
[c000000028a6be30] c00000000000869c syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000008072e9461c
SP (ffff86eee90) is in userspace
5:mon> di c000000000349f40 (PC value)
c000000000349f40 7d20f828 lwarx r9,r0,r31
c000000000349f44 2c090000 cmpwi r9,0
c000000000349f48 40820010 bne c000000000349f58 #
._spin_lock+0x38/0x88
c000000000349f4c 7c00f92d stwcx. r0,r0,r31
c000000000349f50 40a2fff0 bne c000000000349f40 #
._spin_lock+0x20/0x88
5:mon> r
R00 = 0000000080000005 R16 = 0000000010005b18
R01 = c000000028a6b590 R17 = 0000000010005b10
R02 = c000000000578f80 R18 = 00000ffff86ef390
R03 = 000420000000007f R19 = 0000000010005b38
R04 = 00000000000012d0 R20 = 0000008072d50698
R05 = 0000000000000010 R21 = 0000000010005c28
R06 = 0000000000000030 R22 = 00000000100170c8
R07 = 0000000000000220 R23 = 0000000000000002
R08 = 0000000000000010 R24 = 0000000000000010
R09 = c00000002dfe2980 R25 = 0000000000000800
R10 = c000000000605448 R26 = 0000000000000010
R11 = 0000000000000000 R27 = c00000002dfe2900
R12 = d000000000aaa3f0 R28 = 8000000000009032
R13 = c000000000464d00 R29 = 00000000000012d0
R14 = 00000000100170c0 R30 = c0000000004a59b8
R15 = 0000000010005638 R31 = 000420000000007f
pc = c000000000349f40 ._spin_lock+0x20/0x88
lr = c0000000000d6640 .__cache_alloc_node+0x4c/0x174
msr = 8000000000001032 cr = 24000444
ctr = c0000000000d6cf0 xer = 0000000020000000 trap = 300
dar = 000420000000007f dsisr = 40000000
======================================
Thanks
Srinivasa Ds
--
Summary: Compilation of systemtap causes the system to crash on
p570 system.
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: runtime
AssignedTo: systemtap at sources dot redhat dot com
ReportedBy: srinivasa at in dot ibm dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
@ 2007-01-24 15:13 ` mmlnx at us dot ibm dot com
2007-01-25 20:27 ` fche at redhat dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: mmlnx at us dot ibm dot com @ 2007-01-24 15:13 UTC (permalink / raw)
To: systemtap
------- Additional Comments From mmlnx at us dot ibm dot com 2007-01-24 15:13 -------
I doubt this was caused by the compilation. The tests run in parallel so it's
difficult to associate what you see on the screen with the test that caused the
problem. Is this crash repeatable?
Also, you're using an old elfutils. Try using elfutils-0.125 and see if you can
repeat the crash.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com
@ 2007-01-25 20:27 ` fche at redhat dot com
2007-01-29 8:31 ` srinivasa at in dot ibm dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2007-01-25 20:27 UTC (permalink / raw)
To: systemtap
------- Additional Comments From fche at redhat dot com 2007-01-25 20:27 -------
(In reply to comment #0)
> I was compiling latest systemtap source on rhel5(2.6.18-1.2961.el5) kernel and
> system dropped to xmon.
Not really just compiling: you were running test cases.
> What xmon shows
> =============================
> Unable to handle kernel paging request for data at address 0x420000000007f
> 5:mon> e
> cpu 0x5: Vector: 300 (Data Access) at [c000000028a6b310]
> pc: c000000000349f40: ._spin_lock+0x20/0x88
This resembles a memory corruption that may even precede this systemtap module.
It is certainly *before* any probes are even registered, let alone run. The
runtime is the only part more or less running by this time.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com
2007-01-25 20:27 ` fche at redhat dot com
@ 2007-01-29 8:31 ` srinivasa at in dot ibm dot com
2007-01-30 7:17 ` srinivasa at in dot ibm dot com
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: srinivasa at in dot ibm dot com @ 2007-01-29 8:31 UTC (permalink / raw)
To: systemtap
------- Additional Comments From srinivasa at in dot ibm dot com 2007-01-29 08:30 -------
Iam able to reproduce this bug on latest upstream kernel 2.6.20-rc6 kernel also.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
` (2 preceding siblings ...)
2007-01-29 8:31 ` srinivasa at in dot ibm dot com
@ 2007-01-30 7:17 ` srinivasa at in dot ibm dot com
2007-01-30 12:16 ` fche at redhat dot com
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: srinivasa at in dot ibm dot com @ 2007-01-30 7:17 UTC (permalink / raw)
To: systemtap
------- Additional Comments From srinivasa at in dot ibm dot com 2007-01-30 07:17 -------
Here is my analysis of this bug
======================================
1) Looking at the backtrace, _stp_map_init() calls kmalloc_node() with cpu as
argument. kmalloc_node() is same as kmalloc if NUMA is not configured and
kmalloc_node() calls kmem_cache_alloc_node() if NUMA is configured.
_stp_map_init is called by _stp_pmap_new() within for_each_cpu() brace.
=============================================
static int _stp_map_init(MAP m, unsigned max_entries, int type, int key_size,
int data_si
ze, int cpu)
{
int size;
.....................................
.....................................
for (i = 0; i < max_entries; i++) {
if (cpu < 0)
tmp = kmalloc(size, STP_ALLOC_FLAGS);
else
tmp = kmalloc_node(size, STP_ALLOC_FLAGS, cpu);
if (!tmp)
return -1;;
dbug ("allocated %lx\n", (long)tmp);
=========================================================================
static PMAP _stp_pmap_new(unsigned max_entries, int type, int key_size, int
data_size)
{
int i;
MAP map, m;
PMAP pmap = (PMAP) kmalloc(sizeof(struct pmap), STP_ALLOC_FLAGS);
if (pmap == NULL)
return NULL;
.........................
.........................
for_each_cpu(i) {
m = per_cpu_ptr (map, i);
if (_stp_map_init(m, max_entries, type, key_size, data_size, i)) {
goto err1;
}
}
============================================
2) Since in my system, NUMA is configured kmalloc_node() calls
kmem_cache_alloc_node with "cpu" as nodeid
====================================================
#ifdef CONFIG_NUMA
extern void *__kmalloc_node(size_t size, gfp_t flags, int node);
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
if (__builtin_constant_p(size)) {
int i = 0;
#define CACHE(x) \
.......................
..............................
return kmem_cache_alloc_node((flags & GFP_DMA) ?
malloc_sizes[i].cs_dmacachep :
malloc_sizes[i].cs_cachep, flags, node);
}
========================================================
3) This means systemtap code expects the number of nodes in numa should be same
as number of cpu's.
kmem_cache_alloc_node() inturn calls ___cache_alloc_node where
cachep->nodelists[nodeid] gives wrong address because in my system number of
nodes are less than number of cpus.
================================
Mount-cache hash table entries: 4096
Processor 1 found.
Processor 2 found.
Processor 3 found.
Processor 4 found.
Processor 5 found.
Processor 6 found.
Processor 7 found.
Brought up 8 CPUs
Node 0 CPUs: 0-7
Node 1 CPUs:
Node 2 CPUs:
Node 3 CPUs:
sizeof(vma)=176 bytes
sizeof(page)=56 bytes
sizeof(inode)=560 bytes
=================================
for example: I have 8 cpu's in my system and 4 numa nodes. nodelists[8] gives
wrong address and that is causing the oops.
============================================================
static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
int nodeid)
{
struct list_head *entry;
struct slab *slabp;
struct kmem_list3 *l3;
void *obj;
int x;
l3 = cachep->nodelists[nodeid]; <<<========PC is here
BUG_ON(!l3);
============================================
Hence assuming, number of nodes equal to number of cpus in systemtap modules is
causing this bug.
Martin
Any ideas??
Thanks
Srinivasa DS
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |hunt at redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
` (3 preceding siblings ...)
2007-01-30 7:17 ` srinivasa at in dot ibm dot com
@ 2007-01-30 12:16 ` fche at redhat dot com
2007-01-30 14:37 ` hunt at redhat dot com
2007-01-30 14:38 ` hunt at redhat dot com
6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2007-01-30 12:16 UTC (permalink / raw)
To: systemtap
------- Additional Comments From fche at redhat dot com 2007-01-30 12:15 -------
Very nice analysis.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
` (4 preceding siblings ...)
2007-01-30 12:16 ` fche at redhat dot com
@ 2007-01-30 14:37 ` hunt at redhat dot com
2007-01-30 14:38 ` hunt at redhat dot com
6 siblings, 0 replies; 8+ messages in thread
From: hunt at redhat dot com @ 2007-01-30 14:37 UTC (permalink / raw)
To: systemtap
------- Additional Comments From hunt at redhat dot com 2007-01-30 14:36 -------
Looks like there is a missing cpu to node mapping there. It's probably time to
just give up on NUMA for older kernels and just use alloc_percpu or percpu_alloc
as it is in the kernel.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system.
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
` (5 preceding siblings ...)
2007-01-30 14:37 ` hunt at redhat dot com
@ 2007-01-30 14:38 ` hunt at redhat dot com
6 siblings, 0 replies; 8+ messages in thread
From: hunt at redhat dot com @ 2007-01-30 14:38 UTC (permalink / raw)
To: systemtap
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|systemtap at sources dot |hunt at redhat dot com
|redhat dot com |
http://sourceware.org/bugzilla/show_bug.cgi?id=3911
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-01-30 14:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com
2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com
2007-01-25 20:27 ` fche at redhat dot com
2007-01-29 8:31 ` srinivasa at in dot ibm dot com
2007-01-30 7:17 ` srinivasa at in dot ibm dot com
2007-01-30 12:16 ` fche at redhat dot com
2007-01-30 14:37 ` hunt at redhat dot com
2007-01-30 14:38 ` hunt at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).