* [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x
@ 2016-10-27 19:47 dsmith at redhat dot com
2016-11-01 20:15 ` [Bug runtime/20742] " dsmith at redhat dot com
2016-11-10 16:00 ` dsmith at redhat dot com
0 siblings, 2 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2016-10-27 19:47 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=20742
Bug ID: 20742
Summary: kernel pointer dereference BUG on RHEL6 s390x
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: runtime
Assignee: systemtap at sourceware dot org
Reporter: dsmith at redhat dot com
Target Milestone: ---
Host: s390x
When running the testsuite (in standard mode, not in parallel mode) on RHEL6
(2.6.32-642.el6.s390x.debug), I'm seeing the following crash:
====
Unable to handle kernel pointer dereference at virtual kernel address
000000007c
264000
Oops: 0011 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: modloop(U) uprobes(U) ipv6 qeth_l2 vmur qeth qdio lcs ctcm
fs
m ccwgroup ext4 jbd2 mbcache dasd_fba_mod dasd_eckd_mod dasd_mod dm_mirror
dm_re
gion_hash dm_log dm_mod [last unloaded: modloop]
CPU: 0 Not tainted 2.6.32-642.el6.s390x.debug #1
Process loop (pid: 44856, task: 0000000062e24340, ksp: 000000007b91f978)
Krnl PSW : 0704000180000000 0000000000513600 (down_write+0x74/0x114)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000000 ffffffff00000001 0000000000000000 0000000062e24340
00000000005135fa 0000000000000000 0000000000000002 0000000036726188
000000007c264bf0 000000007a88e288 000000007c264bf0 000000007c264c48
000003e001fab084 000000000051f8d0 00000000005135fa 000000007b91fc08
Krnl Code: 00000000005135ee: e310f0a00024 stg %r1,160(%r15)
00000000005135f4: c0e5ffe3fb38 brasl %r14,192c64
00000000005135fa: e310d0000004 lg %r1,0(%r13)
>0000000000513600: e320a0000004 lg %r2,0(%r10)
0000000000513606: b9020022 ltgr %r2,%r2
000000000051360a: a7740007 brc 7,513618
000000000051360e: eb21a0000030 csg %r2,%r1,0(%r10)
0000000000513614: a744fff9 brc 4,513606
Call Trace:
([<00000000005135fa>] down_write+0x6e/0x114)
[<000003e001fab084>] uprobe_report_clone+0xb0/0x71c [uprobes]
[<00000000001bde74>] utrace_report_clone+0xbc/0x170
[<000000000014c754>] do_fork+0x3d8/0x47c
[<000000000010b2a2>] SyS_clone+0x62/0x70
[<000000000011a272>] sysc_tracego+0xe/0x14
[<0000004f48ec35d4>] 0x4f48ec35d4
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<0000000000192d46>] lock_acquire+0xe2/0x138
Kernel panic - not syncing: Fatal exception: panic_on_oops
CPU: 0 Tainted: G D -- ------------ 2.6.32-642.el6.s390x.debug #1
Process loop (pid: 44856, task: 0000000062e24340, ksp: 000000007b91f978)
000000007b91f8a8 000000007b91f828 0000000000000002 0000000000000000
000000007b91f8c8 000000007b91f840 000000007b91f840 000000000050ed16
0000000000000000 0000000000000001 0000000000060011 0000000000000088
000000000000000d 000000000000000c 000000007b91f898 0000000000000000
0000000000000000 0000000000105fbc 000000007b91f828 000000007b91f868
Call Trace:
([<0000000000105eb4>] show_trace+0xf0/0x148)
[<000000000050eb40>] panic+0xcc/0x240
[<0000000000106502>] die+0x162/0x164
[<00000000001014f6>] do_no_context+0xae/0xec
[<00000000005158f8>] do_dat_exception+0x218/0x318
[<000000000011a43a>] pgm_exit+0x0/0x14
[<0000000000513600>] down_write+0x74/0x114
([<00000000005135fa>] down_write+0x6e/0x114)
[<000003e001fab084>] uprobe_report_clone+0xb0/0x71c [uprobes]
[<00000000001bde74>] utrace_report_clone+0xbc/0x170
[<000000000014c754>] do_fork+0x3d8/0x47c
[<000000000010b2a2>] SyS_clone+0x62/0x70
[<000000000011a272>] sysc_tracego+0xe/0x14
[<0000004f48ec35d4>] 0x4f48ec35d4
INFO: lockdep is turned off.01: HCPGSP2629I The virtual machine is placed in CP
mode due to a SIGP stop from
CPU 01.
====
The last test run in systemtap.log is systemtap.unprivileged/pr16806.exp.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug runtime/20742] kernel pointer dereference BUG on RHEL6 s390x
2016-10-27 19:47 [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x dsmith at redhat dot com
@ 2016-11-01 20:15 ` dsmith at redhat dot com
2016-11-10 16:00 ` dsmith at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2016-11-01 20:15 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=20742
--- Comment #1 from David Smith <dsmith at redhat dot com> ---
Created attachment 9606
--> https://sourceware.org/bugzilla/attachment.cgi?id=9606&action=edit
proposed patch
Here's a patch that attempts to fix this problem. There are actually 3 fixes in
this patch.
Fix #1: uprobe_report_clone(), did the following:
====
rcu_read_lock();
ptask = (struct uprobe_task *)rcu_dereference(engine->data);
uproc = ptask->uproc;
rcu_read_unlock();
/*
* Lock uproc so no new uprobes can be installed 'til all
* report_clone activities are completed. Lock uproc_table
* in case we have to run uprobe_fork_uproc().
*/
lock_uproc_table();
down_write(&uproc->rwsem);
====
This isn't correct. When we're under the rcu_read_lock(), the ptask and uproc
points should be valid. However, outside of the rcu protection, the ptask/uproc
memory can be freed.
To fix this, I changed the code to the following:
====
rcu_read_lock();
ptask = (struct uprobe_task *)rcu_dereference(engine->data);
BUG_ON(!ptask);
/* Keep uproc intact until just before we return. */
uproc = uprobe_get_process(ptask->uproc);
rcu_read_unlock();
if (!uproc)
/* uprobe_free_process() has probably clobbered utask->proc. */
return UTRACE_DETACH;
/*
* Lock uproc so no new uprobes can be installed 'til all
* report_clone activities are completed. Lock uproc_table
* in case we have to run uprobe_fork_uproc().
*/
lock_uproc_table();
down_write(&uproc->rwsem);
====
This changes the code to increase the reference count on uproc so it won't get
freed. Similar code is present in uprobe_report_signal(), uprobe_report_exit(),
etc.
I also added code to uprobe_report_clone() to decrement the reference count on
uproc before returning.
When the systemtap.unprivileged/pr16806.exp is run with the
uprobe_report_clone() changes, the test no longer crashed the system, but 2 new
problems popped up:
[ BUG: lock held when returning to user space! ]
------------------------------------------------
loop/4866 is leaving the kernel with locks still held!
2 locks held by loop/4866:
#0: (&uproc->rwsem){+++++.}, at: [<000003e00108ad32>]
uprobe_report_signal+0x2
96/0xe80 [uprobes]
#1: (&slot->rwsem){+.+...}, at: [<000003e0010898e0>]
uprobe_find_insn_slot+0x1
60/0x33c [uprobes]
Fix #2: To fix first lock being held problem above I took a look at
uprobe_report_signal(). That code is quit complicated, and has 'goto'
statements that jump from one switch statement case to another. I could see
several paths through that code that could lead to uproc->rwsem still being
held on exit.
The changes to uprobe_report_signal() attempt to fix those issues.
Fix #3: uprobe_find_insn_slot() always returns the slot read-locked. It is only
called by uprobe_get_insn_slot(), which returns that read-locked slot.
uprobe_get_insn_slot() is only called by uprobe_pre_ssout(), which wasn't
unlocking the slot. I changed uprobe_pre_ssout() to unlock the slot before
returning.
With this path the test passes fine. However, I'll need to do more testing to
ensure I haven't broken anything else.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug runtime/20742] kernel pointer dereference BUG on RHEL6 s390x
2016-10-27 19:47 [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x dsmith at redhat dot com
2016-11-01 20:15 ` [Bug runtime/20742] " dsmith at redhat dot com
@ 2016-11-10 16:00 ` dsmith at redhat dot com
1 sibling, 0 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2016-11-10 16:00 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=20742
--- Comment #2 from David Smith <dsmith at redhat dot com> ---
Unfortunately, this patch causes the at_var.exp test to get hung, so I'll need
to do more work here.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-11-10 16:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 19:47 [Bug runtime/20742] New: kernel pointer dereference BUG on RHEL6 s390x dsmith at redhat dot com
2016-11-01 20:15 ` [Bug runtime/20742] " dsmith at redhat dot com
2016-11-10 16:00 ` dsmith at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).