* [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system. @ 2007-01-24 9:50 srinivasa at in dot ibm dot com 2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com ` (6 more replies) 0 siblings, 7 replies; 8+ messages in thread From: srinivasa at in dot ibm dot com @ 2007-01-24 9:50 UTC (permalink / raw) To: systemtap My environment is systemtap(systemtap-20070120.tar.bz2), elfutils(elfutils-0.124.tar.gz), kernel(2.6.18-1.2961.el5), p570 lpared system. I was compiling latest systemtap source on rhel5(2.6.18-1.2961.el5) kernel and system dropped to xmon. Screen looks loke this ================================== FAIL: probefunc:kernel.statement(0xc00000000005be34) compilation FAIL: probefunc:kernel.function("scheduler_tick") compilation FAIL: probefunc:kernel.inline("context_switch") compilation Running /home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/simple.exp ... Running /home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/timeofday.exp ... FAIL: timeofday test compilation Running /home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/timers.exp ... Running /home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.base/tri.exp ... Running /home/systemtap/tmp/stap_testing_200701240903/src/testsuite/systemtap.maps/absentstats.exp ... /////System crashed///// ========================================= What xmon shows ============================= Unable to handle kernel paging request for data at address 0x420000000007f 5:mon> e cpu 0x5: Vector: 300 (Data Access) at [c000000028a6b310] pc: c000000000349f40: ._spin_lock+0x20/0x88 lr: c0000000000d6640: .__cache_alloc_node+0x4c/0x174 sp: c000000028a6b590 msr: 8000000000001032 dar: 420000000007f dsisr: 40000000 current = 0xc00000002ad74430 paca = 0xc000000000464d00 pid = 15804, comm = staprun 5:mon> t [c000000028a6b610] c0000000000d6640 .__cache_alloc_node+0x4c/0x174 [c000000028a6b6b0] c0000000000d6cc8 .kmem_cache_alloc_node+0x104/0x12c [c000000028a6b750] d000000000aa6f18 ._stp_map_init+0xa0/0x150 [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6b800] d000000000aa70d4 ._stp_pmap_new+0x10c/0x1f0 [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6b8c0] d000000000aa7328 ._stp_pmap_new_ix+0x170/0x28c [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6b970] d000000000aa757c .systemtap_module_init+0x138/0x254 [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6ba20] d000000000aa76a8 .probe_start+0x10/0x2c [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6baa0] d000000000aa7734 ._stp_handle_start+0x70/0x10c [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6bbb0] d000000000aa79b8 ._stp_proc_write_cmd+0x1e8/0x9b0 [stap_a0def33066a20a930648d1bcbf25f718_896] [c000000028a6bcf0] c0000000000e026c .vfs_write+0x118/0x200 [c000000028a6bd90] c0000000000e09dc .sys_write+0x4c/0x8c [c000000028a6be30] c00000000000869c syscall_exit+0x0/0x40 --- Exception: c00 (System Call) at 0000008072e9461c SP (ffff86eee90) is in userspace 5:mon> di c000000000349f40 (PC value) c000000000349f40 7d20f828 lwarx r9,r0,r31 c000000000349f44 2c090000 cmpwi r9,0 c000000000349f48 40820010 bne c000000000349f58 # ._spin_lock+0x38/0x88 c000000000349f4c 7c00f92d stwcx. r0,r0,r31 c000000000349f50 40a2fff0 bne c000000000349f40 # ._spin_lock+0x20/0x88 5:mon> r R00 = 0000000080000005 R16 = 0000000010005b18 R01 = c000000028a6b590 R17 = 0000000010005b10 R02 = c000000000578f80 R18 = 00000ffff86ef390 R03 = 000420000000007f R19 = 0000000010005b38 R04 = 00000000000012d0 R20 = 0000008072d50698 R05 = 0000000000000010 R21 = 0000000010005c28 R06 = 0000000000000030 R22 = 00000000100170c8 R07 = 0000000000000220 R23 = 0000000000000002 R08 = 0000000000000010 R24 = 0000000000000010 R09 = c00000002dfe2980 R25 = 0000000000000800 R10 = c000000000605448 R26 = 0000000000000010 R11 = 0000000000000000 R27 = c00000002dfe2900 R12 = d000000000aaa3f0 R28 = 8000000000009032 R13 = c000000000464d00 R29 = 00000000000012d0 R14 = 00000000100170c0 R30 = c0000000004a59b8 R15 = 0000000010005638 R31 = 000420000000007f pc = c000000000349f40 ._spin_lock+0x20/0x88 lr = c0000000000d6640 .__cache_alloc_node+0x4c/0x174 msr = 8000000000001032 cr = 24000444 ctr = c0000000000d6cf0 xer = 0000000020000000 trap = 300 dar = 000420000000007f dsisr = 40000000 ====================================== Thanks Srinivasa Ds -- Summary: Compilation of systemtap causes the system to crash on p570 system. Product: systemtap Version: unspecified Status: NEW Severity: normal Priority: P2 Component: runtime AssignedTo: systemtap at sources dot redhat dot com ReportedBy: srinivasa at in dot ibm dot com http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com @ 2007-01-24 15:13 ` mmlnx at us dot ibm dot com 2007-01-25 20:27 ` fche at redhat dot com ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: mmlnx at us dot ibm dot com @ 2007-01-24 15:13 UTC (permalink / raw) To: systemtap ------- Additional Comments From mmlnx at us dot ibm dot com 2007-01-24 15:13 ------- I doubt this was caused by the compilation. The tests run in parallel so it's difficult to associate what you see on the screen with the test that caused the problem. Is this crash repeatable? Also, you're using an old elfutils. Try using elfutils-0.125 and see if you can repeat the crash. -- http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com 2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com @ 2007-01-25 20:27 ` fche at redhat dot com 2007-01-29 8:31 ` srinivasa at in dot ibm dot com ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: fche at redhat dot com @ 2007-01-25 20:27 UTC (permalink / raw) To: systemtap ------- Additional Comments From fche at redhat dot com 2007-01-25 20:27 ------- (In reply to comment #0) > I was compiling latest systemtap source on rhel5(2.6.18-1.2961.el5) kernel and > system dropped to xmon. Not really just compiling: you were running test cases. > What xmon shows > ============================= > Unable to handle kernel paging request for data at address 0x420000000007f > 5:mon> e > cpu 0x5: Vector: 300 (Data Access) at [c000000028a6b310] > pc: c000000000349f40: ._spin_lock+0x20/0x88 This resembles a memory corruption that may even precede this systemtap module. It is certainly *before* any probes are even registered, let alone run. The runtime is the only part more or less running by this time. -- http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com 2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com 2007-01-25 20:27 ` fche at redhat dot com @ 2007-01-29 8:31 ` srinivasa at in dot ibm dot com 2007-01-30 7:17 ` srinivasa at in dot ibm dot com ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: srinivasa at in dot ibm dot com @ 2007-01-29 8:31 UTC (permalink / raw) To: systemtap ------- Additional Comments From srinivasa at in dot ibm dot com 2007-01-29 08:30 ------- Iam able to reproduce this bug on latest upstream kernel 2.6.20-rc6 kernel also. -- http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com ` (2 preceding siblings ...) 2007-01-29 8:31 ` srinivasa at in dot ibm dot com @ 2007-01-30 7:17 ` srinivasa at in dot ibm dot com 2007-01-30 12:16 ` fche at redhat dot com ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: srinivasa at in dot ibm dot com @ 2007-01-30 7:17 UTC (permalink / raw) To: systemtap ------- Additional Comments From srinivasa at in dot ibm dot com 2007-01-30 07:17 ------- Here is my analysis of this bug ====================================== 1) Looking at the backtrace, _stp_map_init() calls kmalloc_node() with cpu as argument. kmalloc_node() is same as kmalloc if NUMA is not configured and kmalloc_node() calls kmem_cache_alloc_node() if NUMA is configured. _stp_map_init is called by _stp_pmap_new() within for_each_cpu() brace. ============================================= static int _stp_map_init(MAP m, unsigned max_entries, int type, int key_size, int data_si ze, int cpu) { int size; ..................................... ..................................... for (i = 0; i < max_entries; i++) { if (cpu < 0) tmp = kmalloc(size, STP_ALLOC_FLAGS); else tmp = kmalloc_node(size, STP_ALLOC_FLAGS, cpu); if (!tmp) return -1;; dbug ("allocated %lx\n", (long)tmp); ========================================================================= static PMAP _stp_pmap_new(unsigned max_entries, int type, int key_size, int data_size) { int i; MAP map, m; PMAP pmap = (PMAP) kmalloc(sizeof(struct pmap), STP_ALLOC_FLAGS); if (pmap == NULL) return NULL; ......................... ......................... for_each_cpu(i) { m = per_cpu_ptr (map, i); if (_stp_map_init(m, max_entries, type, key_size, data_size, i)) { goto err1; } } ============================================ 2) Since in my system, NUMA is configured kmalloc_node() calls kmem_cache_alloc_node with "cpu" as nodeid ==================================================== #ifdef CONFIG_NUMA extern void *__kmalloc_node(size_t size, gfp_t flags, int node); static inline void *kmalloc_node(size_t size, gfp_t flags, int node) { if (__builtin_constant_p(size)) { int i = 0; #define CACHE(x) \ ....................... .............................. return kmem_cache_alloc_node((flags & GFP_DMA) ? malloc_sizes[i].cs_dmacachep : malloc_sizes[i].cs_cachep, flags, node); } ======================================================== 3) This means systemtap code expects the number of nodes in numa should be same as number of cpu's. kmem_cache_alloc_node() inturn calls ___cache_alloc_node where cachep->nodelists[nodeid] gives wrong address because in my system number of nodes are less than number of cpus. ================================ Mount-cache hash table entries: 4096 Processor 1 found. Processor 2 found. Processor 3 found. Processor 4 found. Processor 5 found. Processor 6 found. Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-7 Node 1 CPUs: Node 2 CPUs: Node 3 CPUs: sizeof(vma)=176 bytes sizeof(page)=56 bytes sizeof(inode)=560 bytes ================================= for example: I have 8 cpu's in my system and 4 numa nodes. nodelists[8] gives wrong address and that is causing the oops. ============================================================ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid) { struct list_head *entry; struct slab *slabp; struct kmem_list3 *l3; void *obj; int x; l3 = cachep->nodelists[nodeid]; <<<========PC is here BUG_ON(!l3); ============================================ Hence assuming, number of nodes equal to number of cpus in systemtap modules is causing this bug. Martin Any ideas?? Thanks Srinivasa DS -- What |Removed |Added ---------------------------------------------------------------------------- CC| |hunt at redhat dot com http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com ` (3 preceding siblings ...) 2007-01-30 7:17 ` srinivasa at in dot ibm dot com @ 2007-01-30 12:16 ` fche at redhat dot com 2007-01-30 14:37 ` hunt at redhat dot com 2007-01-30 14:38 ` hunt at redhat dot com 6 siblings, 0 replies; 8+ messages in thread From: fche at redhat dot com @ 2007-01-30 12:16 UTC (permalink / raw) To: systemtap ------- Additional Comments From fche at redhat dot com 2007-01-30 12:15 ------- Very nice analysis. -- http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com ` (4 preceding siblings ...) 2007-01-30 12:16 ` fche at redhat dot com @ 2007-01-30 14:37 ` hunt at redhat dot com 2007-01-30 14:38 ` hunt at redhat dot com 6 siblings, 0 replies; 8+ messages in thread From: hunt at redhat dot com @ 2007-01-30 14:37 UTC (permalink / raw) To: systemtap ------- Additional Comments From hunt at redhat dot com 2007-01-30 14:36 ------- Looks like there is a missing cpu to node mapping there. It's probably time to just give up on NUMA for older kernels and just use alloc_percpu or percpu_alloc as it is in the kernel. -- What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug runtime/3911] Compilation of systemtap causes the system to crash on p570 system. 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com ` (5 preceding siblings ...) 2007-01-30 14:37 ` hunt at redhat dot com @ 2007-01-30 14:38 ` hunt at redhat dot com 6 siblings, 0 replies; 8+ messages in thread From: hunt at redhat dot com @ 2007-01-30 14:38 UTC (permalink / raw) To: systemtap -- What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|systemtap at sources dot |hunt at redhat dot com |redhat dot com | http://sourceware.org/bugzilla/show_bug.cgi?id=3911 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-01-30 14:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-01-24 9:50 [Bug runtime/3911] New: Compilation of systemtap causes the system to crash on p570 system srinivasa at in dot ibm dot com 2007-01-24 15:13 ` [Bug runtime/3911] " mmlnx at us dot ibm dot com 2007-01-25 20:27 ` fche at redhat dot com 2007-01-29 8:31 ` srinivasa at in dot ibm dot com 2007-01-30 7:17 ` srinivasa at in dot ibm dot com 2007-01-30 12:16 ` fche at redhat dot com 2007-01-30 14:37 ` hunt at redhat dot com 2007-01-30 14:38 ` hunt at redhat dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).