* [Bug runtime/13714] New: panic when sampling backtrace() in timer.profile
@ 2012-02-20 6:29 brendan.gregg at joyent dot com
2012-02-20 14:30 ` [Bug runtime/13714] " mjw at redhat dot com
2012-02-20 22:08 ` mjw at redhat dot com
0 siblings, 2 replies; 3+ messages in thread
From: brendan.gregg at joyent dot com @ 2012-02-20 6:29 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=13714
Bug #: 13714
Summary: panic when sampling backtrace() in timer.profile
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: runtime
AssignedTo: systemtap@sourceware.org
ReportedBy: brendan.gregg@joyent.com
Classification: Unclassified
The kernel can panic when profiling stack traces on Fedora 16.
window1 (sample workload):
# while :; do tar cf /dev/null /usr; done
window2 (profiling):
# stap -D MAXTRACE=100 -D MAXSTRINGLEN=4096 -ve 'global s; probe timer.profile
{ s[backtrace()] <<< 1; } probe end { foreach (i in s+) { print_stack(i);
printf("\t%d\n", @count(s[i])); } }'
This was tried about a dozen times, and paniced on four of them.
The system had kdump configured and collected a crash dump. Some details:
# crash /var/crash/2012-02-19-23\:08/vmcore
/usr/lib/debug/lib/modules/3.2.6-3.fc16.x86_64/vmlinux
[...]
KERNEL: /usr/lib/debug/lib/modules/3.2.6-3.fc16.x86_64/vmlinux
DUMPFILE: /var/crash/2012-02-19-23:08/vmcore
CPUS: 1
DATE: Sun Feb 19 15:04:50 2012
UPTIME: 00:05:41
LOAD AVERAGE: 1.39, 0.78, 0.32
TASKS: 72
NODENAME: 9d219ce8-cf52-409f-a14a-b210850f3231
RELEASE: 3.2.6-3.fc16.x86_64
VERSION: #1 SMP Mon Feb 13 20:35:42 UTC 2012
MACHINE: x86_64 (2393 Mhz)
MEMORY: 1 GB
PANIC: "[ 341.694043] Oops: 0000 [#1] SMP " (check log for details)
PID: 3922
COMMAND: "tar"
TASK: ffff880037878000 [THREAD_INFO: ffff88001783e000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
[...]
crash> bt -a
PID: 3922 TASK: ffff880037878000 CPU: 0 COMMAND: "tar"
#0 [ffff88003fc037b0] machine_kexec at ffffffff81038bda
#1 [ffff88003fc03820] crash_kexec at ffffffff810b8482
#2 [ffff88003fc038f0] oops_end at ffffffff815e2b08
#3 [ffff88003fc03920] no_context at ffffffff815d6a05
#4 [ffff88003fc03980] __bad_area_nosemaphore at ffffffff815d6bdd
#5 [ffff88003fc039e0] bad_area_nosemaphore at ffffffff815d6c0f
#6 [ffff88003fc039f0] do_page_fault at ffffffff815e55e6
#7 [ffff88003fc03b00] page_fault at ffffffff815e20b5
[exception RIP: get_uleb128+83]
RIP: ffffffffa00773f3 RSP: ffff88003fc03bb8 RFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffffffffffff RDI: ffff88003fc03bf0
RBP: ffff88003fc03bb8 R8: 0000000000000030 R9: 0000000000000007
R10: 0000000000000001 R11: 0000000000000040 R12: ffff88003aae1088
R13: ffff88001783ffd8 R14: 0000000000000000 R15: ffff88003aae1088
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#8 [ffff88003fc03bc0] compute_expr at ffffffffa007c63f
[stap_ddfda8cbbf4994e727c76bf533732ecf_3874]
#9 [ffff88003fc03c70] unwind_frame at ffffffffa007df6c
[stap_ddfda8cbbf4994e727c76bf533732ecf_3874]
#10 [ffff88003fc03dc0] probe_1993 at ffffffffa007fcc6
[stap_ddfda8cbbf4994e727c76bf533732ecf_3874]
#11 [ffff88003fc03e40] enter_profile_probes at ffffffffa007b2c5
[stap_ddfda8cbbf4994e727c76bf533732ecf_3874]
#12 [ffff88003fc03e70] profile_tick at ffffffff81098ea9
#13 [ffff88003fc03e90] tick_sched_timer at ffffffff810a0e4e
#14 [ffff88003fc03ec0] __run_hrtimer at ffffffff81093e90
#15 [ffff88003fc03f20] hrtimer_interrupt at ffffffff8109480b
#16 [ffff88003fc03f90] smp_apic_timer_interrupt at ffffffff815ec589
#17 [ffff88003fc03fb0] apic_timer_interrupt at ffffffff815ea45e
--- <IRQ stack> ---
#18 [ffff88001783fed0] apic_timer_interrupt at ffffffff815ea45e
RIP: ffffffffffffff10 RSP: 0000000000000203 RFLAGS: 00000010
RAX: ffff88003fc13780 RBX: 00000000008c2ed0 RCX: 0000000000000000
RDX: ffff88001783e000 RSI: 0000000000000008 RDI: ffff88001783ff78
RBP: ffffffff8119691f R8: 0000000000000001 R9: 0000000000000001
R10: 0000000000000001 R11: ffffffff811839f2 R12: ffff88001783fef8
R13: 00007fff549ffff0 R14: ffff88001783ff40 R15: ffffffff81196824
ORIG_RAX: 000000000000fe2e CS: ffffffff815e1e8d SS: ffff88001783ff88
bt: WARNING: possibly bogus exception frame
crash> dis get_uleb128+83
0xffffffffa00773f3 <get_uleb128+83>: movzbl (%rdx),%r10d
Version and system details:
# stap -V
Systemtap translator/driver (version 1.6/0.152 non-git sources)
Copyright (C) 2005-2011 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
enabled features: AVAHI LIBRPM LIBSQLITE3 NSS BOOST_SHARED_PTR
TR1_UNORDERED_MAP NLS
# uname -a
Linux 9d219ce8-cf52-409f-a14a-b210850f3231 3.2.6-3.fc16.x86_64 #1 SMP Mon Feb
13 20:35:42 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
Fedora release 16 (Verne)
This is Fedora as a guest running under KVM. SystemTap was installed via yum.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug runtime/13714] panic when sampling backtrace() in timer.profile
2012-02-20 6:29 [Bug runtime/13714] New: panic when sampling backtrace() in timer.profile brendan.gregg at joyent dot com
@ 2012-02-20 14:30 ` mjw at redhat dot com
2012-02-20 22:08 ` mjw at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: mjw at redhat dot com @ 2012-02-20 14:30 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=13714
Mark Wielaard <mjw at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mjw at redhat dot com
--- Comment #1 from Mark Wielaard <mjw at redhat dot com> 2012-02-20 14:26:42 UTC ---
After some poking (also needed to increase MAXMAPENTRIES) I could finally
replicate it on 3.3.0-0.rc1.git6.1.fc17.x86_64 with
systemtap-devel-1.7-1.fc17.x86_64
[ 1305.976762] stap_e93c7971f27dbaecad41f45add08f2ea_2319: systemtap:
1.7/0.152, base: ffffffffa0662000, memory:
4291data/40text/86ctx/2058net/244893alloc kb, probes: 2
[ 1309.808090] BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
[ 1309.809029] IP: [<ffffffffa06620c4>] get_uleb128+0x54/0x80
[stap_e93c7971f27dbaecad41f45add08f2ea_2319]
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug runtime/13714] panic when sampling backtrace() in timer.profile
2012-02-20 6:29 [Bug runtime/13714] New: panic when sampling backtrace() in timer.profile brendan.gregg at joyent dot com
2012-02-20 14:30 ` [Bug runtime/13714] " mjw at redhat dot com
@ 2012-02-20 22:08 ` mjw at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: mjw at redhat dot com @ 2012-02-20 22:08 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=13714
Mark Wielaard <mjw at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
AssignedTo|systemtap at sourceware dot |mjw at redhat dot com
|org |
--- Comment #2 from Mark Wielaard <mjw at redhat dot com> 2012-02-20 22:08:21 UTC ---
Found the root cause of this issue. It happens when a DW_CFA operation that
defines the CFA as dwarf expression is followed by a DW_CFA operation that
(re)defined the CFA as register+offset. In that case we forgot the reset the
REG_STATE.cfa_is_expr flag which made compute_expr() interpret the reg/offset
as expr pointer (because they share their values in a union).
While adding more sanity checks to make sure we catch such issues I found what
looks like bad CFI in the x86_64 kernel in common_interrupt
(arch/x86/kernel/entry_64.S) which defines CFI "by hand" and has a
CFI_DEF_CFA_REGISTER following a def_cfa_expression, which is invalid.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-02-20 22:08 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-20 6:29 [Bug runtime/13714] New: panic when sampling backtrace() in timer.profile brendan.gregg at joyent dot com
2012-02-20 14:30 ` [Bug runtime/13714] " mjw at redhat dot com
2012-02-20 22:08 ` mjw at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).