public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/30393] New: New lockdep error on a fedora debug kernel
@ 2023-04-26 21:47 agentzh at gmail dot com
  2023-04-27 17:36 ` [Bug runtime/30393] " agentzh at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-04-26 21:47 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

            Bug ID: 30393
           Summary: New lockdep error on a fedora debug kernel
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: agentzh at gmail dot com
  Target Milestone: ---

I've noticed the following lockdep error while running tests against the latest
git master branch of systemtap (commit 71f89cccaf46):

```
Apr 26 14:29:36 fed32-dev kernel: CPU: 28 PID: 218395 Comm: staprun Tainted: G 
         OE     5.11.22-100.fc32.x86_64+debug #1
Apr 26 14:29:36 fed32-dev kernel: =============================
Apr 26 14:29:36 fed32-dev kernel: WARNING: suspicious RCU usage
Apr 26 14:29:36 fed32-dev kernel: 5.11.22-100.fc32.x86_64+debug #1 Tainted: G  
        OE    
Apr 26 14:29:36 fed32-dev kernel: Hardware name: QEMU Standard PC (i440FX +
PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
Apr 26 14:29:36 fed32-dev kernel: -----------------------------
Apr 26 14:29:36 fed32-dev kernel: Call Trace:
Apr 26 14:29:36 fed32-dev kernel: kernel/sched/core.c:7927 Illegal context
switch in RCU-bh read-side critical section!
Apr 26 14:29:36 fed32-dev kernel:  <IRQ>
Apr 26 14:29:36 fed32-dev kernel: 
                                  other info that might help us debug this:
Apr 26 14:29:36 fed32-dev kernel: 
                                  rcu_scheduler_active = 2, debug_locks = 0
Apr 26 14:29:36 fed32-dev kernel:  dump_stack+0x8b/0xb0
Apr 26 14:29:36 fed32-dev kernel:  save_trace.cold+0x13/0x18
Apr 26 14:29:36 fed32-dev kernel:  ? unwind_next_frame+0x41e/0x560
Apr 26 14:29:36 fed32-dev kernel: 1 lock held by cc1/217697:
Apr 26 14:29:36 fed32-dev kernel:  ? create_prof_cpu_mask+0x20/0x20
Apr 26 14:29:36 fed32-dev kernel:  #0: ffff888112c35930
Apr 26 14:29:36 fed32-dev kernel:  mark_lock+0x1b2/0x810
Apr 26 14:29:36 fed32-dev kernel:  ? link_path_walk.part.0+0x20d/0x340
Apr 26 14:29:36 fed32-dev kernel:  __lock_acquire+0x7f4/0x1e60
Apr 26 14:29:36 fed32-dev kernel:  (
Apr 26 14:29:36 fed32-dev kernel:  ? add_lock_to_list.constprop.0+0x94/0xf0
Apr 26 14:29:36 fed32-dev kernel:  lock_acquire+0xc8/0x380
Apr 26 14:29:36 fed32-dev kernel:  ? __wake_up_common_lock+0x52/0x90
Apr 26 14:29:36 fed32-dev kernel: &mm->mmap_lock
Apr 26 14:29:36 fed32-dev kernel:  ? lock_acquire+0xc8/0x380
Apr 26 14:29:36 fed32-dev kernel:  _raw_spin_lock_irqsave+0x4d/0x90
Apr 26 14:29:36 fed32-dev kernel: #2
Apr 26 14:29:36 fed32-dev kernel:  ? __wake_up_common_lock+0x52/0x90
Apr 26 14:29:36 fed32-dev kernel:  __wake_up_common_lock+0x52/0x90
Apr 26 14:29:36 fed32-dev kernel:  ? _stp_request_exit+0x2f/0x2f
[stap_96bbb4216c211c351bf940dda33dc74_218395]
Apr 26 14:29:36 fed32-dev kernel: ){++++}-{3:3}
Apr 26 14:29:36 fed32-dev kernel:  ? _stp_request_exit+0x2f/0x2f
[stap_96bbb4216c211c351bf940dda33dc74_218395]
Apr 26 14:29:36 fed32-dev kernel: , at: do_user_addr_fault+0x125/0x4c0
Apr 26 14:29:36 fed32-dev kernel:  _stp_ctl_work_callback+0x8a/0xb5
[stap_96bbb4216c211c351bf940dda33dc74_218395]
Apr 26 14:29:36 fed32-dev kernel: 
                                  stack backtrace:
Apr 26 14:29:36 fed32-dev kernel:  call_timer_fn+0xa9/0x2c0
Apr 26 14:29:36 fed32-dev kernel:  __run_timers.part.0+0x1e4/0x2d0
Apr 26 14:29:36 fed32-dev kernel:  ? kvm_clock_get_cycles+0x14/0x20
Apr 26 14:29:36 fed32-dev kernel:  run_timer_softirq+0x26/0x50
Apr 26 14:29:36 fed32-dev kernel:  __do_softirq+0xdd/0x4d5
Apr 26 14:29:36 fed32-dev kernel:  asm_call_irq_on_stack+0xf/0x20
Apr 26 14:29:36 fed32-dev kernel:  </IRQ>
Apr 26 14:29:36 fed32-dev kernel:  do_softirq_own_stack+0x5b/0x70
Apr 26 14:29:36 fed32-dev kernel:  __irq_exit_rcu+0xe4/0x130
Apr 26 14:29:36 fed32-dev kernel:  irq_exit_rcu+0xa/0x20
Apr 26 14:29:36 fed32-dev kernel:  sysvec_apic_timer_interrupt+0x4b/0xa0
Apr 26 14:29:36 fed32-dev kernel:  asm_sysvec_apic_timer_interrupt+0x12/0x20
Apr 26 14:29:36 fed32-dev kernel: RIP: 0010:lock_is_held_type+0xe1/0x110
Apr 26 14:29:36 fed32-dev kernel: Code: 41 0f b6 46 32 45 31 c0 83 e0 03 44 39
e8 41 0f 94 c0 b8 ff ff ff ff 65 0f c1 05 ba a6 49 7e 83 f8 01 75 26 48 8b 3c
24 57 9d <0f> 1f 44 00 00 48 83 c4 08 44 89 c0 5b 5d 41 5c 41 5d 41 5e 41 5f
Apr 26 14:29:36 fed32-dev kernel: RSP: 0018:ffffc900024abbe0 EFLAGS: 00000246
Apr 26 14:29:36 fed32-dev kernel: RAX: 0000000000000001 RBX: ffffffff82d8fbc0
RCX: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000000 RSI: ffffffff82d8fbc0
RDI: 0000000000000246
Apr 26 14:29:36 fed32-dev kernel: RBP: ffff8881c24c8000 R08: 0000000000000000
R09: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel: R10: 0000000000000246 R11: 0000000000000000
R12: ffff8881c24c8de0
Apr 26 14:29:36 fed32-dev kernel: R13: 00000000ffffffff R14: ffff8881c24c8de0
R15: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel:  rcu_read_lock_sched_held+0x3f/0x80
Apr 26 14:29:36 fed32-dev kernel:  ? dput+0xb4/0x480
Apr 26 14:29:36 fed32-dev kernel:  lock_release+0x359/0x400
Apr 26 14:29:36 fed32-dev kernel:  step_into+0xa6/0x750
Apr 26 14:29:36 fed32-dev kernel:  ? kernfs_dop_revalidate+0xb8/0xd0
Apr 26 14:29:36 fed32-dev kernel:  walk_component+0x62/0x190
Apr 26 14:29:36 fed32-dev kernel:  link_path_walk.part.0+0x20d/0x340
Apr 26 14:29:36 fed32-dev kernel:  path_openat+0x9f/0xa50
Apr 26 14:29:36 fed32-dev kernel:  do_filp_open+0x75/0x100
Apr 26 14:29:36 fed32-dev kernel:  ? do_raw_spin_unlock+0x4b/0xa0
Apr 26 14:29:36 fed32-dev kernel:  ? _raw_spin_unlock+0x1f/0x30
Apr 26 14:29:36 fed32-dev kernel:  ? alloc_fd+0xf6/0x1e0
Apr 26 14:29:36 fed32-dev kernel:  do_sys_openat2+0x7b/0x130
Apr 26 14:29:36 fed32-dev kernel:  __x64_sys_openat+0x46/0x70
Apr 26 14:29:36 fed32-dev kernel:  do_syscall_64+0x33/0x40
Apr 26 14:29:36 fed32-dev kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 26 14:29:36 fed32-dev kernel: RIP: 0033:0x7fac564dcfab
Apr 26 14:29:36 fed32-dev kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64
8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00
00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
Apr 26 14:29:36 fed32-dev kernel: RSP: 002b:00007ffdf1017e90 EFLAGS: 00000246
ORIG_RAX: 0000000000000101
Apr 26 14:29:36 fed32-dev kernel: RAX: ffffffffffffffda RBX: 00000000010cd440
RCX: 00007fac564dcfab
Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000000 RSI: 00000000010b0a40
RDI: 00000000ffffff9c
Apr 26 14:29:36 fed32-dev kernel: RBP: 00000000010b0a40 R08: 0000000000000008
R09: 0000000000000001
Apr 26 14:29:36 fed32-dev kernel: R10: 0000000000000000 R11: 0000000000000246
R12: 0000000000000000
Apr 26 14:29:36 fed32-dev kernel: R13: 00000000010cd440 R14: 0000000000000001
R15: 0000000000000000
Apr 26 14:29:36 fed32-dev kernel: CPU: 2 PID: 217697 Comm: cc1 Tainted: G      
    OE     5.11.22-100.fc32.x86_64+debug #1
Apr 26 14:29:36 fed32-dev kernel: Hardware name: QEMU Standard PC (i440FX +
PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
Apr 26 14:29:36 fed32-dev kernel: Call Trace:
Apr 26 14:29:36 fed32-dev kernel:  dump_stack+0x8b/0xb0
Apr 26 14:29:36 fed32-dev kernel:  ___might_sleep+0x180/0x190
Apr 26 14:29:36 fed32-dev kernel:  do_user_addr_fault+0x145/0x4c0
Apr 26 14:29:36 fed32-dev kernel:  exc_page_fault+0x67/0x2a0
Apr 26 14:29:36 fed32-dev kernel:  ? asm_exc_page_fault+0x8/0x30
Apr 26 14:29:36 fed32-dev kernel:  asm_exc_page_fault+0x1e/0x30
Apr 26 14:29:36 fed32-dev kernel: RIP: 0033:0x10655fa
Apr 26 14:29:36 fed32-dev kernel: Code: 54 f8 03 00 00 41 8b 79 54 b9 01 00 00
00 31 d2 31 f6 48 83 c7 08 e8 65 9d e4 ff 49 8b 7c 24 18 49 8b 2f 49 89 c0 4c
8b 57 48 <4c> 89 10 48 89 47 48 48 8d 40 08 c7 47 50 28 00 00 00 4d 89 78 28
Apr 26 14:29:36 fed32-dev kernel: RSP: 002b:00007ffcf576c1b0 EFLAGS: 00010206
Apr 26 14:29:36 fed32-dev kernel: RAX: 00007fb0faabc000 RBX: 00007fb0fad98be0
RCX: 000000000000000c
Apr 26 14:29:36 fed32-dev kernel: RDX: 0000000000000006 RSI: 0000000000000000
RDI: 00007fb0fada8850
Apr 26 14:29:36 fed32-dev kernel: RBP: 00007fb0fb88d600 R08: 00007fb0faabc000
R09: 0000000000003bb5
Apr 26 14:29:36 fed32-dev kernel: R10: 00007fb0faabb400 R11: 0000000000000003
R12: 00007fb0fb87abb0
Apr 26 14:29:36 fed32-dev kernel: R13: 0000000000000002 R14: 00007fb0faabb7c8
R15: 00007fb0fad98c68
```

I'm not sure if this a kernel bug or a stap runtime bug. Any help here will be
appreciated. Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] New lockdep error on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
@ 2023-04-27 17:36 ` agentzh at gmail dot com
  2023-04-28  0:48 ` agentzh at gmail dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-04-27 17:36 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

--- Comment #1 from agentzh <agentzh at gmail dot com> ---
This issue is relatively hard to reproduce, and thus, we cannot prepare a
minimal test case for this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] New lockdep error on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
  2023-04-27 17:36 ` [Bug runtime/30393] " agentzh at gmail dot com
@ 2023-04-28  0:48 ` agentzh at gmail dot com
  2023-04-30 21:29 ` [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/_stp_request_exit() " agentzh at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-04-28  0:48 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

--- Comment #2 from agentzh <agentzh at gmail dot com> ---
Seems like _stp_ctl_work_callback -> _stp_request_exit -> _stp_ctl_send_notify
is called from within an IRQ context which should never happen. So it looks
like stap runtime bug?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/_stp_request_exit() on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
  2023-04-27 17:36 ` [Bug runtime/30393] " agentzh at gmail dot com
  2023-04-28  0:48 ` agentzh at gmail dot com
@ 2023-04-30 21:29 ` agentzh at gmail dot com
  2023-05-01 18:19 ` agentzh at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-04-30 21:29 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

agentzh <agentzh at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|New lockdep error on a      |Lockdep error in
                   |fedora debug kernel         |_stp_ctl_work_callback()/_s
                   |                            |tp_request_exit() on a
                   |                            |fedora debug kernel

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/_stp_request_exit() on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
                   ` (2 preceding siblings ...)
  2023-04-30 21:29 ` [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/_stp_request_exit() " agentzh at gmail dot com
@ 2023-05-01 18:19 ` agentzh at gmail dot com
  2023-05-01 18:19 ` [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() " agentzh at gmail dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-05-01 18:19 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

--- Comment #3 from agentzh <agentzh at gmail dot com> ---
Seems like the kernel backtrace is a little bit misleading. It actually has
nothing to do with _stp_request_exit() after all:

```
(gdb) l *_stp_ctl_work_callback+0x8a
0x6cbb is in _stp_ctl_work_callback
(/path/to/systemtap/runtime/transport/transport.c:559).
554             stp_spin_unlock_irqrestore(&_stp_ctl_ready_lock, flags);
555
556             _stp_runtime_entryfn_put_context(c);
557
558             if (do_io)
559                     wake_up_interruptible(&_stp_ctl_wq);
560
561             /* if exit flag is set AND we have finished with
systemtap_module_init() */
562             if (unlikely(_stp_exit_flag && _stp_probes_started))
563                     _stp_request_exit();
```

And I also checked the disassembly of the .ko file, it is indeed the
`wake_up_interruptible()` call:

```
0000000000006be1 <_stp_ctl_work_callback>:
...
/path/to/systemtap/runtime/transport/transport.c:559
                wake_up_interruptible(&_stp_ctl_wq);
    6c50:       b9 00 00 00 00          mov    ecx,0x0
    6c55:       ba 01 00 00 00          mov    edx,0x1
    6c5a:       be 01 00 00 00          mov    esi,0x1
    6c5f:       48 c7 c7 00 00 00 00    mov    rdi,0x0
    6c66:       e8 00 00 00 00          call   6c6b
<_stp_ctl_work_callback+0x8a>
    6c6b:       eb c6                   jmp    6c33
<_stp_ctl_work_callback+0x52>
```

And _stp_ctl_work_callback+0x8a is the instruction at the address 6c6b above.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
                   ` (3 preceding siblings ...)
  2023-05-01 18:19 ` agentzh at gmail dot com
@ 2023-05-01 18:19 ` agentzh at gmail dot com
  2023-05-18 17:43 ` agentzh at gmail dot com
  2023-05-18 17:47 ` agentzh at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-05-01 18:19 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

agentzh <agentzh at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Lockdep error in            |Lockdep error in
                   |_stp_ctl_work_callback()/_s |_stp_ctl_work_callback()/wa
                   |tp_request_exit() on a      |ke_up_interruptible() on a
                   |fedora debug kernel         |fedora debug kernel

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
                   ` (4 preceding siblings ...)
  2023-05-01 18:19 ` [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() " agentzh at gmail dot com
@ 2023-05-18 17:43 ` agentzh at gmail dot com
  2023-05-18 17:47 ` agentzh at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-05-18 17:43 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

--- Comment #4 from agentzh <agentzh at gmail dot com> ---
Here's my best theory: a user process has grabbed the mm map lock, and in its
critical section, stap runtime's ctl work timer is fired; and inside the ctl
work callback, stap calls wake_up_interruptible(), which may try to trigger
scheduler context switches and hence that lockdep error "Illegal context switch
in RCU-bh read-side critical section!".

__wake_up_common_lock() appears in the lockdep backtrace, which is indeed
called from __wake_up(), which is wake_up_interruptible(). Furthermore, both
_stp_request_exit and _stp_ctl_work_callback are in the lockdep backtrace.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() on a fedora debug kernel
  2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
                   ` (5 preceding siblings ...)
  2023-05-18 17:43 ` agentzh at gmail dot com
@ 2023-05-18 17:47 ` agentzh at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: agentzh at gmail dot com @ 2023-05-18 17:47 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=30393

--- Comment #5 from agentzh <agentzh at gmail dot com> ---
FWIW, I reproduced this lockdep error at least twice on Fedora's 5.11.22 debug
kernel yesterday while testing the user access begin patch for stap. They are
of the same or very similar lockdep error and backtrace.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-18 17:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-26 21:47 [Bug runtime/30393] New: New lockdep error on a fedora debug kernel agentzh at gmail dot com
2023-04-27 17:36 ` [Bug runtime/30393] " agentzh at gmail dot com
2023-04-28  0:48 ` agentzh at gmail dot com
2023-04-30 21:29 ` [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/_stp_request_exit() " agentzh at gmail dot com
2023-05-01 18:19 ` agentzh at gmail dot com
2023-05-01 18:19 ` [Bug runtime/30393] Lockdep error in _stp_ctl_work_callback()/wake_up_interruptible() " agentzh at gmail dot com
2023-05-18 17:43 ` agentzh at gmail dot com
2023-05-18 17:47 ` agentzh at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).