public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels
@ 2023-04-29 22:03 agentzh at gmail dot com
  2023-04-30  5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-04-29 22:03 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

            Bug ID: 30405
           Summary: Kernel errors with Fedora 36's 6.2.12 debug kernels
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: agentzh at gmail dot com
  Target Milestone: ---

We noted that when using Fedora 36 x86_64's latest kernel-debug kernel,
6.2.12-100.fc36.x86_64+debug, when using a minimal kprobes stap script, the
kernel always reports the following error in dmesg:

```
[   89.347060] stap_f93d809e35e31d9f81df52024bfed1b5__230 (a.stp): systemtap:
4.9/0.188, base: ffffffffc06ec000, memory:
52data/28text/25ctx/524390net/389alloc kb, probes: 1
[   89.347073] BUG: sleeping function called from invalid context at
kernel/kallsyms.c:305
[   89.347076] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1831,
name: stapio
[   89.347078] preempt_count: 1, expected: 0
[   89.347080] RCU nest depth: 0, expected: 0
[   89.347081] 3 locks held by stapio/1831:
[   89.347083]  #0: ffff88810d1ceef8 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0x52/0x60
[   89.347094]  #1: ffff88816f73c498 (sb_writers#3){.+.+}-{0:0}, at:
ksys_write+0x74/0xf0
[   89.347104]  #2: ffffffffc06f9330 (cmd_mutex){+.+.}-{3:3}, at:
_stp_ctl_write_cmd+0xe2/0xe10 [stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347116] Preemption disabled at:
[   89.347117] [<ffffffffc06f0de8>] _stp_ctl_write_cmd+0xd48/0xe10
[stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347124] CPU: 20 PID: 1831 Comm: stapio Tainted: G           OE     
6.2.12-100.fc36.x86_64+debug #1
[   89.347127] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.15.0-1.fc35 04/01/2014
[   89.347129] Call Trace:
[   89.347131]  <TASK>
[   89.347133]  dump_stack_lvl+0x71/0x90
[   89.347143]  __might_resched+0x1c2/0x2e0
[   89.347148]  ? __pfx_stapkp_symbol_callback+0x10/0x10
[stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347154]  kallsyms_on_each_symbol+0x6a/0xf0
[   89.347184]  _stp_ctl_write_cmd+0xd62/0xe10
[stap_f93d809e35e31d9f81df52024bfed1b5__230]
[   89.347189]  ? lock_acquire+0xe2/0x2c0
[   89.347196]  proc_reg_write+0x53/0xa0
[   89.347200]  vfs_write+0xea/0x530
[   89.347203]  ? __fdget_pos+0x52/0x60
[   89.347211]  ksys_write+0x74/0xf0
[   89.347215]  do_syscall_64+0x58/0x80
[   89.347221]  ? kvm_sched_clock_read+0x14/0x40
[   89.347224]  ? sched_clock_cpu+0xb/0xc0
[   89.347227]  ? lock_release+0x15d/0x400
[   89.347230]  ? _raw_spin_unlock_irq+0x24/0x50
[   89.347237]  ? _raw_spin_unlock_irq+0x24/0x50
[   89.347240]  ? lockdep_hardirqs_on+0x7d/0x100
[   89.347245]  ? _raw_spin_unlock_irq+0x34/0x50
[   89.347248]  ? syscall_exit_to_user_mode+0xe/0x50
[   89.347252]  ? do_syscall_64+0x67/0x80
[   89.347254]  ? lockdep_hardirqs_on+0x7d/0x100
[   89.347256]  ? do_syscall_64+0x67/0x80
[   89.347259]  ? lockdep_hardirqs_on+0x7d/0x100
[   89.347261]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   89.347267] RIP: 0033:0x7f0d4ccdbc6f
[   89.347270] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 76 f8 ff 48
8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c 76 f8 ff 48
[   89.347272] RSP: 002b:00007ffe6bb215c0 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[   89.347275] RAX: ffffffffffffffda RBX: 0000000000000008 RCX:
00007f0d4ccdbc6f
[   89.347277] RDX: 000000000000000c RSI: 00007ffe6bb215f0 RDI:
0000000000000004
[   89.347278] RBP: 000000000000000c R08: 0000000000000000 R09:
00007ffe6bb2077f
[   89.347280] R10: 0000000000000008 R11: 0000000000000293 R12:
00007ffe6bb21a50
[   89.347281] R13: 00007ffe6bb23c80 R14: 0000000000000001 R15:
00007ffe6bb21ad4
[   89.347291]  </TASK>
```

The a.stp file is defined as

```
probe kprobe.function("finish_task_switch") ?,
kprobe.function("finish_task_switch.*") ? {
    println("Hit");
    exit();
}
```

I'm using the latest master branch of the upstream systemtap repo as of this
wring (commit 418f0a45ca447).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/30405] Kernel errors with Fedora 36's 6.2.12 debug kernels
  2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
@ 2023-04-30  5:51 ` agentzh at gmail dot com
  2023-04-30 21:27 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() " agentzh at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-04-30  5:51 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

--- Comment #1 from agentzh <agentzh at gmail dot com> ---
The following patch seems to fix the error itself:

```
diff --git a/runtime/linux/kprobes.c b/runtime/linux/kprobes.c
index c0f4f5c0f..72f4b148e 100644
--- a/runtime/linux/kprobes.c
+++ b/runtime/linux/kprobes.c
@@ -819,9 +819,7 @@ stapkp_init(struct stap_kprobe_probe *probes,
 #ifdef STAPCONF_MODULE_MUTEX
        mutex_lock(&module_mutex);
 #endif
-       preempt_disable();
        kallsyms_on_each_symbol(stapkp_symbol_callback, &sd);
-       preempt_enable();
 #ifdef STAPCONF_MODULE_MUTEX
        mutex_unlock(&module_mutex);
 #endif
```

But it seems to effectively revert the previous commit for fixing a softlockup
in PR20735:

```
commit 58c9bd1563aece0c020d9c8da5cf8db8ef028439
Author: David Smith <dsmith@redhat.com>
Date:   Thu Nov 10 09:59:16 2016 -0600

    Fix PR20735 by updating kprobes.c to avoid a soft lockup.

    * runtime/linux/kprobes.c: Instead of grabbing the module mutex around
      calling kallsyms_on_each_symbol(), just disable/enable preemption
      instead to avoid a soft lockup.
```

What should we do then?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2.12 debug kernels
  2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
  2023-04-30  5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
@ 2023-04-30 21:27 ` agentzh at gmail dot com
  2023-05-02 22:44 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 " agentzh at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-04-30 21:27 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

agentzh <agentzh at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Kernel errors with Fedora   |Kernel errors in
                   |36's 6.2.12 debug kernels   |kallsyms_on_each_symbol()
                   |                            |with Fedora 36's 6.2.12
                   |                            |debug kernels

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
  2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
  2023-04-30  5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
  2023-04-30 21:27 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() " agentzh at gmail dot com
@ 2023-05-02 22:44 ` agentzh at gmail dot com
  2023-05-02 22:45 ` agentzh at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-02 22:44 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

agentzh <agentzh at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Kernel errors in            |Kernel errors in
                   |kallsyms_on_each_symbol()   |kallsyms_on_each_symbol()
                   |with Fedora 36's 6.2.12     |with Fedora 36's 6.2/6.1
                   |debug kernels               |debug kernels

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
  2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
                   ` (2 preceding siblings ...)
  2023-05-02 22:44 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 " agentzh at gmail dot com
@ 2023-05-02 22:45 ` agentzh at gmail dot com
  2023-05-06 22:23 ` agentzh at gmail dot com
  2023-05-09 19:53 ` agentzh at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-02 22:45 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

--- Comment #2 from agentzh <agentzh at gmail dot com> ---
This problem also happens on Fedora 36 x86_64's 6.1.18 kernel.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
  2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
                   ` (3 preceding siblings ...)
  2023-05-02 22:45 ` agentzh at gmail dot com
@ 2023-05-06 22:23 ` agentzh at gmail dot com
  2023-05-09 19:53 ` agentzh at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-06 22:23 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

--- Comment #3 from agentzh <agentzh at gmail dot com> ---
Got a related CPU lockup on older kernel (5.11.22) on x86_64. The dmesg error
is here:

https://gist.github.com/agentzh/cbe640331de5f849b360b9415a6d312e

And the following patch fixes this lockup (as well as the original "sleeping
function called from invalid context" error:

https://gist.github.com/agentzh/ba69ca1acac1b475bbf0951c7c09a245

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 debug kernels
  2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
                   ` (4 preceding siblings ...)
  2023-05-06 22:23 ` agentzh at gmail dot com
@ 2023-05-09 19:53 ` agentzh at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: agentzh at gmail dot com @ 2023-05-09 19:53 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30405

agentzh <agentzh at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #4 from agentzh <agentzh at gmail dot com> ---
Patch committed as commit 13c18518da.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-09 19:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-29 22:03 [Bug runtime/30405] New: Kernel errors with Fedora 36's 6.2.12 debug kernels agentzh at gmail dot com
2023-04-30  5:51 ` [Bug runtime/30405] " agentzh at gmail dot com
2023-04-30 21:27 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() " agentzh at gmail dot com
2023-05-02 22:44 ` [Bug runtime/30405] Kernel errors in kallsyms_on_each_symbol() with Fedora 36's 6.2/6.1 " agentzh at gmail dot com
2023-05-02 22:45 ` agentzh at gmail dot com
2023-05-06 22:23 ` agentzh at gmail dot com
2023-05-09 19:53 ` agentzh at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).