* [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
@ 2012-05-14 15:41 mjw at redhat dot com
2012-05-14 15:48 ` [Bug runtime/14107] " mjw at redhat dot com
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-14 15:41 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
Bug #: 14107
Summary: Bad user unwinding from kernel fatal signal handler
for some x86_64 kernels
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: runtime
AssignedTo: systemtap@sourceware.org
ReportedBy: mjw@redhat.com
CC: atomlin@redhat.com, bmr@redhat.com
Classification: Unclassified
The following program:
int
func (void)
{
int *foo = (void *) 0x1234;
*foo = 0x12345;
return 0;
}
int
main (void)
{
return func ();
}
compiled with gcc -o bad_code bad_code.c and the following stap script:
probe kernel.function("show_signal_msg") {
/*(PF_USER | PR_WRITE) */
if (execname() == "bad_code") {
if ($error_code & 0x6) {
printf ("\nUser mode process %s [pid: %d] received a
SIGSEGV - error_code: 0x%x\n", execname(), pid(), $error_code)
print_ubacktrace()
}
}
}
ran with: stap -d ./bad_code --ldd show_signal_msg.stp -c ./bad_code
produces the following (correct) user backtrace on 3.3.5-2.fc16.x86_64:
User mode process bad_code [pid: 18431] received a SIGSEGV - error_code: 0x6
0x400484 : func+0x10/0x1d [/usr/local/build/systemtap-obj/bad_code]
0x40049a : main+0x9/0xf [/usr/local/build/systemtap-obj/bad_code]
0x7fd419d1069d : __libc_start_main+0xed/0x1c0 [/lib64/libc-2.14.90.so]
0x4003b9 : _start+0x29/0x2c [/usr/local/build/systemtap-obj/bad_code]
But on some other x86_64 kernels it produces:
WARNING: _stp_read_address failed to access memory location
User mode process bad_code [pid: 12152] received a SIGSEGV - error_code: 0x6
0x400484 : func+0x10/0x1d [/home/mark/build/systemtap-obj/bad_code]
Warning: child process exited with signal 11 (Segmentation fault)
WARNING: Number of errors: 0, skipped probes: 1
WARNING: /usr/local/install/systemtap/bin/staprun exited with status: 1
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/14107] Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
@ 2012-05-14 15:48 ` mjw at redhat dot com
2012-05-14 22:17 ` mjw at redhat dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-14 15:48 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
--- Comment #1 from Mark Wielaard <mjw at redhat dot com> 2012-05-14 15:47:52 UTC ---
The issue is that on x86_64 (it doesn't happen on i686) stap tries to recover
the user space registers by unwinding the kernel stack. This succeeds on the
f16 kernel and then the unwinder takes those recovered registers to do the user
space unwind. But it fails on the rhel6 kernel. See -DDEBUG_UNWIND=99 output:
_stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5
_stp_get_uregs:209: failed to recover user reg state
And the for user space the unwinder has to do with partial register values and
fails...
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/14107] Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
2012-05-14 15:48 ` [Bug runtime/14107] " mjw at redhat dot com
@ 2012-05-14 22:17 ` mjw at redhat dot com
2012-05-14 22:23 ` mjw at redhat dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-14 22:17 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
--- Comment #2 from Mark Wielaard <mjw at redhat dot com> 2012-05-14 22:16:13 UTC ---
(In reply to comment #1)
> See -DDEBUG_UNWIND=99 output:
>
> _stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5
> _stp_get_uregs:209: failed to recover user reg state
>
> And the for user space the unwinder has to do with partial register values and
> fails...
According to /proc/kallsyms:
ffffffff814ef8d0 T page_fault
ffffffff814ef900 T machine_check
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/14107] Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
2012-05-14 15:48 ` [Bug runtime/14107] " mjw at redhat dot com
2012-05-14 22:17 ` mjw at redhat dot com
@ 2012-05-14 22:23 ` mjw at redhat dot com
2012-05-15 14:09 ` mjw at redhat dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-14 22:23 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
--- Comment #3 from Mark Wielaard <mjw at redhat dot com> 2012-05-14 22:22:51 UTC ---
And we do actually go trough do_page_fault just before this frame:
_stp_get_uregs:194: unwind levels: 17, ret: 0, pc=0xffffffff814f253e
unwind:1452: pc=ffffffff814f253d, ffffffff814f253e
unwind:1492: trying debug_frame
set_no_state_rule:375: reg=10, where=1
_stp_search_unwind_hdr:777: binary search for ffffffff814f253d
_stp_search_unwind_hdr:839: fde off=26520
_stp_search_unwind_hdr:849: returning fde=ffffffffa14be360
startLoc=ffffffff814f
2500
unwind_frame:1184: kernel: fde=ffffffffa14be360
unwind_frame:1189: kernel: cie=ffffffffa14bde28
parse_fde_cie:282: map retAddrReg value 16 to reg_info idx 16
unwind_frame:1203: startLoc: ffffffff814f2500, endLoc: ffffffff814f2597
unwind_frame:1251: cie=ffffffffa14bde28 fde=ffffffffa14be360
startLoc=ffffffff81
4f2500 endLoc=ffffffff814f2597, pc=ffffffff814f253d
unwind_frame:1271: processCFI for CIE
[...]
unwind_frame:1426: returning 0 (ffffffff814ef8f5)
_stp_get_uregs:194: unwind levels: 16, ret: 0, pc=0xffffffff814ef8f5
unwind:1452: pc=ffffffff814ef8f4, ffffffff814ef8f5
unwind:1492: trying debug_frame
set_no_state_rule:375: reg=10, where=1
_stp_search_unwind_hdr:777: binary search for ffffffff814ef8f4
_stp_search_unwind_hdr:839: fde off=113238
_stp_search_unwind_hdr:849: returning fde=ffffffffa15ab078
startLoc=ffffffff814ef680
unwind_frame:1184: kernel: fde=ffffffffa15ab078
unwind_frame:1189: kernel: cie=ffffffffa15aafb0
parse_fde_cie:282: map retAddrReg value 16 to reg_info idx 16
unwind_frame:1203: startLoc: ffffffff814ef680, endLoc: ffffffff814ef707
unwind_frame:1205: pc (ffffffff814ef8f4) > endLoc(ffffffff814ef707)
unwind:1496: debug_frame failed: 1, trying eh_frame
unwind_frame:1168: Module kernel: no unwind frame data
_stp_get_uregs:194: unwind levels: 15, ret: -5, pc=0xffffffff814ef8f5
_stp_get_uregs:209: failed to recover user reg state
Since do_page_fault is the actual errorentry for page_fault it looks like the
CFI for do_page_fault is wrong, or we don't process is correctly.
The CFI for do_page_fault looks as follows for 2.6.32-220.7.1.el6.x86_64:
[ 25fe8] CIE length=20
CIE_id: 18446744073709551615
version: 3
augmentation: ""
code_alignment_factor: 1
data_alignment_factor: -8
return_address_register: 16
Program:
def_cfa r7 (rsp) at offset 8
offset_extended_sf r16 (rip) at cfa-8
nop
nop
nop
nop
nop
[ 26520] FDE length=76 cie=[ 25fe8]
CIE_pointer: 155624
initial_location: 0xffffffff814f2500 <do_page_fault>
address_range: 0x97
Program:
advance_loc4 1 to 0x1
def_cfa_offset 16
offset_extended_sf r6 (rbp) at cfa-16
advance_loc4 3 to 0x4
def_cfa_register r6 (rbp)
advance_loc4 23 to 0x1b
offset_extended_sf r14 (r14) at cfa-24
offset_extended_sf r13 (r13) at cfa-32
offset_extended_sf r12 (r12) at cfa-40
offset_extended_sf r3 (rbx) at cfa-48
advance_loc4 83 to 0x6e
remember_state
restore r6 (rbp)
def_cfa r7 (rsp) at offset 8
restore r14 (r14)
restore r13 (r13)
restore r12 (r12)
restore r3 (rbx)
advance_loc4 1 to 0x6f
restore_state
nop
nop
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/14107] Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
` (2 preceding siblings ...)
2012-05-14 22:23 ` mjw at redhat dot com
@ 2012-05-15 14:09 ` mjw at redhat dot com
2012-05-15 14:16 ` mjw at redhat dot com
2012-05-21 11:02 ` mjw at redhat dot com
5 siblings, 0 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-15 14:09 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
--- Comment #4 from Mark Wielaard <mjw at redhat dot com> 2012-05-15 14:07:27 UTC ---
The problem isn't the CFI for do_page_fault, but that there is no CFI for
page_fault. Nor does there seem to be any CFI for any assembly symbol defined
in entry_64.S. Which explains why unwinding to the kernel/user space barrier
just fails.
No idea yet, why the CFI isn't included in /usr/lib/debug/lib/modules/*/vmlinux
for the RHEL6 kernel, it certainly is there in entry_64.S source code. And it
also is in the fedora version
$ eu-readelf --debug-dump=frames
/usr/lib/debug/lib/modules/3.3.5-2.fc16.x86_64/vmlinux | grep -B2 -A1
page_fault
[ 7ae0] FDE length=68 cie=[ 6da8]
CIE_pointer: 28072
initial_location: 0xffffffff815f4850 <page_fault>
address_range: 0x2a
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/14107] Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
` (3 preceding siblings ...)
2012-05-15 14:09 ` mjw at redhat dot com
@ 2012-05-15 14:16 ` mjw at redhat dot com
2012-05-21 11:02 ` mjw at redhat dot com
5 siblings, 0 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-15 14:16 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
--- Comment #5 from Mark Wielaard <mjw at redhat dot com> 2012-05-15 14:14:41 UTC ---
Looks like the RHEL6 kernel is missing this:
commit 9e565292270a2d55524be38835104c564ac8f795
Author: Roland McGrath <roland@redhat.com>
Date: Thu May 13 21:43:03 2010 -0700
x86: Use .cfi_sections for assembly code
The newer assemblers support the .cfi_sections directive so we can put
the CFI from .S files into the .debug_frame section that is preserved
in unstripped vmlinux and in separate debuginfo, rather than the
.eh_frame section that is now discarded by vmlinux.lds.S.
Signed-off-by: Roland McGrath <roland@redhat.com>
LKML-Reference: <20100514044303.A6FE7400BE@magilla.sf.frob.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/14107] Bad user unwinding from kernel fatal signal handler for some x86_64 kernels
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
` (4 preceding siblings ...)
2012-05-15 14:16 ` mjw at redhat dot com
@ 2012-05-21 11:02 ` mjw at redhat dot com
5 siblings, 0 replies; 7+ messages in thread
From: mjw at redhat dot com @ 2012-05-21 11:02 UTC (permalink / raw)
To: systemtap
http://sourceware.org/bugzilla/show_bug.cgi?id=14107
Mark Wielaard <mjw at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #6 from Mark Wielaard <mjw at redhat dot com> 2012-05-21 11:01:19 UTC ---
Not really "fixed", but with the proper kernel patch, see comment #5, this
should just work. Added a testcase to check the system is behaving properly.
commit 07c9d78ebb28b888f01aed9c206e724f0e72db25
Author: Mark Wielaard <mjw@redhat.com>
Date: Mon May 21 12:57:41 2012 +0200
Add testcase for PR14107 Bad user unwinding from kernel fatal signal
handler
This is really a kernel bug, see bug report, when the CFI for the assembly
code is missing we cannot properly recover the register state for the user
process and might give a bad/missing user backtrace.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-05-21 11:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-14 15:41 [Bug runtime/14107] New: Bad user unwinding from kernel fatal signal handler for some x86_64 kernels mjw at redhat dot com
2012-05-14 15:48 ` [Bug runtime/14107] " mjw at redhat dot com
2012-05-14 22:17 ` mjw at redhat dot com
2012-05-14 22:23 ` mjw at redhat dot com
2012-05-15 14:09 ` mjw at redhat dot com
2012-05-15 14:16 ` mjw at redhat dot com
2012-05-21 11:02 ` mjw at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).