* [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels
@ 2023-04-29 22:03 agentzh at gmail dot com
2023-04-30 5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-04-29 22:03 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
Bug ID: 30405
Summary: Kernel errors with Fedora 36's 6.2.12 debug kernels
Product: systemtap
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: runtime
Assignee: systemtap at sourceware dot org
Reporter: agentzh at gmail dot com
Target Milestone: ---
We noted that when using Fedora 36 x86_64's latest kernel-debug kernel,
6.2.12-100.fc36.x86_64+debug, when using a minimal kprobes stap script, the
kernel always reports the following error in dmesg:
```
[ 89.347060] stap_f93d809e35e31d9f81df52024bfed1b5__230 (a.stp): systemtap:
4.9/0.188, base: ffffffffc06ec000, memory:
52data/28text/25ctx/524390net/389alloc kb, probes: 1
[ 89.347073] BUG: sleeping function called from invalid context at
kernel/kallsyms.c:305
[ 89.347076] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1831,
name: stapio
[ 89.347078] preempt_count: 1, expected: 0
[ 89.347080] RCU nest depth: 0, expected: 0
[ 89.347081] 3 locks held by stapio/1831:
[ 89.347083] #0: ffff88810d1ceef8 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0x52/0x60
[ 89.347094] #1: ffff88816f73c498 (sb_writers#3){.+.+}-{0:0}, at:
ksys_write+0x74/0xf0
[ 89.347104] #2: ffffffffc06f9330 (cmd_mutex){+.+.}-{3:3}, at:
_stp_ctl_write_cmd+0xe2/0xe10 [stap_f93d809e35e31d9f81df52024bfed1b5__230]
[ 89.347116] Preemption disabled at:
[ 89.347117] [<ffffffffc06f0de8>] _stp_ctl_write_cmd+0xd48/0xe10
[stap_f93d809e35e31d9f81df52024bfed1b5__230]
[ 89.347124] CPU: 20 PID: 1831 Comm: stapio Tainted: G OE
6.2.12-100.fc36.x86_64+debug #1
[ 89.347127] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.15.0-1.fc35 04/01/2014
[ 89.347129] Call Trace:
[ 89.347131] <TASK>
[ 89.347133] dump_stack_lvl+0x71/0x90
[ 89.347143] __might_resched+0x1c2/0x2e0
[ 89.347148] ? __pfx_stapkp_symbol_callback+0x10/0x10
[stap_f93d809e35e31d9f81df52024bfed1b5__230]
[ 89.347154] kallsyms_on_each_symbol+0x6a/0xf0
[ 89.347184] _stp_ctl_write_cmd+0xd62/0xe10
[stap_f93d809e35e31d9f81df52024bfed1b5__230]
[ 89.347189] ? lock_acquire+0xe2/0x2c0
[ 89.347196] proc_reg_write+0x53/0xa0
[ 89.347200] vfs_write+0xea/0x530
[ 89.347203] ? __fdget_pos+0x52/0x60
[ 89.347211] ksys_write+0x74/0xf0
[ 89.347215] do_syscall_64+0x58/0x80
[ 89.347221] ? kvm_sched_clock_read+0x14/0x40
[ 89.347224] ? sched_clock_cpu+0xb/0xc0
[ 89.347227] ? lock_release+0x15d/0x400
[ 89.347230] ? _raw_spin_unlock_irq+0x24/0x50
[ 89.347237] ? _raw_spin_unlock_irq+0x24/0x50
[ 89.347240] ? lockdep_hardirqs_on+0x7d/0x100
[ 89.347245] ? _raw_spin_unlock_irq+0x34/0x50
[ 89.347248] ? syscall_exit_to_user_mode+0xe/0x50
[ 89.347252] ? do_syscall_64+0x67/0x80
[ 89.347254] ? lockdep_hardirqs_on+0x7d/0x100
[ 89.347256] ? do_syscall_64+0x67/0x80
[ 89.347259] ? lockdep_hardirqs_on+0x7d/0x100
[ 89.347261] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 89.347267] RIP: 0033:0x7f0d4ccdbc6f
[ 89.347270] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 76 f8 ff 48
8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c 76 f8 ff 48
[ 89.347272] RSP: 002b:00007ffe6bb215c0 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[ 89.347275] RAX: ffffffffffffffda RBX: 0000000000000008 RCX:
00007f0d4ccdbc6f
[ 89.347277] RDX: 000000000000000c RSI: 00007ffe6bb215f0 RDI:
0000000000000004
[ 89.347278] RBP: 000000000000000c R08: 0000000000000000 R09:
00007ffe6bb2077f
[ 89.347280] R10: 0000000000000008 R11: 0000000000000293 R12:
00007ffe6bb21a50
[ 89.347281] R13: 00007ffe6bb23c80 R14: 0000000000000001 R15:
00007ffe6bb21ad4
[ 89.347291] </TASK>
```
The a.stp file is defined as
```
probe kprobe.function("finish_task_switch") ?,
kprobe.function("finish_task_switch.*") ? {
println("Hit");
exit();
}
```
I'm using the latest master branch of the upstream systemtap repo as of this
wring (commit 418f0a45ca447).
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/30405] Kernel errors with Fedora 36's 6.2.12 debug kernels
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
@ 2023-04-30 5:51 ` agentzh at gmail dot com
2023-04-30 21:27 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() " agentzh at gmail dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-04-30 5:51 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
--- Comment #1 from agentzh <agentzh at gmail dot com> ---
The following patch seems to fix the error itself:
```
diff --git a/runtime/linux/kprobes.c b/runtime/linux/kprobes.c
index c0f4f5c0f..72f4b148e 100644
--- a/runtime/linux/kprobes.c
+++ b/runtime/linux/kprobes.c
@@ -819,9 +819,7 @@ stapkp_init(struct stap_kprobe_probe *probes,
#ifdef STAPCONF_MODULE_MUTEX
mutex_lock(&module_mutex);
#endif
- preempt_disable();
kallsyms_on_each_symbol(stapkp_symbol_callback, &sd);
- preempt_enable();
#ifdef STAPCONF_MODULE_MUTEX
mutex_unlock(&module_mutex);
#endif
```
But it seems to effectively revert the previous commit for fixing a softlockup
in PR20735:
```
commit 58c9bd1563aece0c020d9c8da5cf8db8ef028439
Author: David Smith <dsmith@redhat.com>
Date: Thu Nov 10 09:59:16 2016 -0600
Fix PR20735 by updating kprobes.c to avoid a soft lockup.
* runtime/linux/kprobes.c: Instead of grabbing the module mutex around
calling kallsyms_on_each_symbol(), just disable/enable preemption
instead to avoid a soft lockup.
```
What should we do then?
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2.12 debug kernels
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
2023-04-30 5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
@ 2023-04-30 21:27 ` agentzh at gmail dot com
2023-05-02 22:44 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 " agentzh at gmail dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-04-30 21:27 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
agentzh <agentzh at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Kernel errors with Fedora |Kernel errors in
|36's 6.2.12 debug kernels |kallsyms_on_each_symbol()
| |with Fedora 36's 6.2.12
| |debug kernels
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
2023-04-30 5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
2023-04-30 21:27 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() " agentzh at gmail dot com
@ 2023-05-02 22:44 ` agentzh at gmail dot com
2023-05-02 22:45 ` agentzh at gmail dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-02 22:44 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
agentzh <agentzh at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Kernel errors in |Kernel errors in
|kallsyms_on_each_symbol() |kallsyms_on_each_symbol()
|with Fedora 36's 6.2.12 |with Fedora 36's 6.2/6.1
|debug kernels |debug kernels
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
` (2 preceding siblings ...)
2023-05-02 22:44 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 " agentzh at gmail dot com
@ 2023-05-02 22:45 ` agentzh at gmail dot com
2023-05-06 22:23 ` agentzh at gmail dot com
2023-05-09 19:53 ` agentzh at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-02 22:45 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
--- Comment #2 from agentzh <agentzh at gmail dot com> ---
This problem also happens on Fedora 36 x86_64's 6.1.18 kernel.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
` (3 preceding siblings ...)
2023-05-02 22:45 ` agentzh at gmail dot com
@ 2023-05-06 22:23 ` agentzh at gmail dot com
2023-05-09 19:53 ` agentzh at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-06 22:23 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
--- Comment #3 from agentzh <agentzh at gmail dot com> ---
Got a related CPU lockup on older kernel (5.11.22) on x86_64. The dmesg error
is here:
https://gist.github.com/agentzh/cbe640331de5f849b360b9415a6d312e
And the following patch fixes this lockup (as well as the original "sleeping
function called from invalid context" error:
https://gist.github.com/agentzh/ba69ca1acac1b475bbf0951c7c09a245
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
` (4 preceding siblings ...)
2023-05-06 22:23 ` agentzh at gmail dot com
@ 2023-05-09 19:53 ` agentzh at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-09 19:53 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=30405
agentzh <agentzh at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #4 from agentzh <agentzh at gmail dot com> ---
Patch committed as commit 13c18518da.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-05-09 19:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
2023-04-30 5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
2023-04-30 21:27 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() " agentzh at gmail dot com
2023-05-02 22:44 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 " agentzh at gmail dot com
2023-05-02 22:45 ` agentzh at gmail dot com
2023-05-06 22:23 ` agentzh at gmail dot com
2023-05-09 19:53 ` agentzh at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).