From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2C8953858D28; Sun, 17 Dec 2023 01:31:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2C8953858D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1702776664; bh=OgBaLE6Ee+Biybos6mOgP30Vuy8V81+7JFttATE6nlQ=; h=From:To:Subject:Date:From; b=nABKVpn+DRXUTTWltlGu8fV1LMhMYKNcH8TPk2Y7m3qcTTop4ukLqi7X+E/S9BhhW GMKtbtRNo/sxwSTF+DbBIyc3T9CwQYqU2MA/zP1aZahOSUcvdFsTQEojBML9+rdGkg BjJGtkIj/i1IqXCJWctC0CewWySwfBWGtY+BLzcI= From: "agentzh at gmail dot com" To: systemtap@sourceware.org Subject: [Bug runtime/31176] New: Avoid spin lock deadlocks in memory pool allocations for mixed NMI and non-NMI contexts. Date: Sun, 17 Dec 2023 01:30:58 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: runtime X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: agentzh at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31176 Bug ID: 31176 Summary: Avoid spin lock deadlocks in memory pool allocations for mixed NMI and non-NMI contexts. Product: systemtap Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: runtime Assignee: systemtap at sourceware dot org Reporter: agentzh at gmail dot com Target Milestone: --- The kernel's lockdep finds dead locks in the stap memory pool allocator's spinlocks when mixing NMI and non-NMI contexts. The following dmesg error f= rom lockdep can always be reproduced on a debug kernel of Fedora by running TES= T 4 in testsuite/systemtap.base/kernel-hw-breakpoint-addr.exp: ``` [ 426.561994] stap_d42289a7e6d2399d2047d1d6eb75940b_7879 (kernel-hw-breakpoint-addr_1.stp): systemtap: 5.1/0.183, base: ffffffffc0e91000, memory: 1276data/200text/459ctx/524390net/375alloc kb, probes: 3 [ 426.591193] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [ 426.591193] WARNING: inconsistent lock state [ 426.591194] 5.11.22-100.fc32.x86_64+debug #1 Tainted: G OE [ 426.591194] -------------------------------- [ 426.591194] inconsistent {INITIAL USE} -> {IN-NMI} usage. [ 426.591195] a.out/7882 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 426.591195] ffff88813de33e30 (lock#8){....}-{2:2}, at: _stp_mempool_alloc+0x27/0x1a0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591198] {INITIAL USE} state was registered at: [ 426.591198] lock_acquire+0x1cc/0x780 [ 426.591199] _raw_spin_lock_irqsave+0x4d/0x90 [ 426.591199] _stp_mempool_alloc+0x27/0x1a0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591199] _stp_transport_init+0x668/0x1870 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591200] do_one_initcall+0xfb/0x530 [ 426.591200] do_init_module+0x1ce/0x7a0 [ 426.591200] load_module+0x78b5/0x9fb0 [ 426.591201] __do_sys_init_module+0x18f/0x220 [ 426.591202] do_syscall_64+0x33/0x40 [ 426.591203] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 426.591204] irq event stamp: 4434 [ 426.591204] hardirqs last enabled at (4433): [] asm_exc_page_fault+0x1e/0x30 [ 426.591205] hardirqs last disabled at (4434): [] exc_debug+0x6c/0x150 [ 426.591205] softirqs last enabled at (4258): [] fpu__clear+0x86/0x160 [ 426.591206] softirqs last disabled at (4254): [] fpu__clear+0x5/0x160 [ 426.591207] other info that might help us debug this: [ 426.591207] Possible unsafe locking scenario: [ 426.591208] CPU0 [ 426.591209] ---- [ 426.591213] lock(lock#8); [ 426.591219] [ 426.591219] lock(lock#8); [ 426.591222] *** DEADLOCK *** [ 426.591222] no locks held by a.out/7882. [ 426.591223] stack backtrace: [ 426.591224] CPU: 6 PID: 7882 Comm: a.out Tainted: G OE=20=20= =20=20 5.11.22-100.fc32.x86_64+debug #1 [ 426.591225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 426.591226] Call Trace: [ 426.591226] <#DB> [ 426.591226] dump_stack+0xae/0xe5 [ 426.591227] lock_acquire.cold+0x3b/0x40 [ 426.591227] ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 [ 426.591228] ? _stp_mempool_alloc+0x27/0x1a0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591229] _raw_spin_lock_irqsave+0x4d/0x90 [ 426.591229] ? _stp_mempool_alloc+0x27/0x1a0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591230] _stp_mempool_alloc+0x27/0x1a0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591230] _stp_ctl_get_buffer+0x1cf/0x2f0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591231] _stp_ctl_log_werr+0x37/0x290 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591231] ? __clear_user+0x47/0x70 [ 426.591232] _stp_warn+0xa3/0xc0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591232] ? _stp_vlog.constprop.0+0x40/0x40 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591233] unwind+0x2cf/0x3d0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7= 879] [ 426.591233] ? _stp_umod_lookup.constprop.0+0x1a0/0x1a0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591234] _stp_get_uregs+0x46d/0x6f0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591234] probe_6378+0x50c/0xec0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591235] ? _stp_snprint_addr.constprop.0.isra.0+0x940/0x940 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591235] enter_hwbkpt_probe+0x582/0xaf0 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591236] ? __stp_utrace_task_finder_target_syscall_entry+0xc90/0xc90 [stap_d42289a7e6d2399d2047d1d6eb75940b_7879] [ 426.591236] __perf_event_overflow+0x11e/0x320 [ 426.591237] perf_bp_event+0x13b/0x150 [ 426.591237] ? __perf_sw_event+0x130/0x130 [ 426.591237] hw_breakpoint_exceptions_notify+0x204/0x310 [ 426.591238] ? atomic_notifier_call_chain+0x5/0x100 [ 426.591238] notifier_call_chain+0x9e/0x180 [ 426.591239] atomic_notifier_call_chain+0x64/0x100 [ 426.591239] notify_die+0x81/0xd0 [ 426.591239] ? atomic_notifier_call_chain+0x100/0x100 [ 426.591240] ? rcu_nmi_enter+0x7c/0xd0 [ 426.591240] notify_debug+0x25/0x30 [ 426.591240] exc_debug+0xe4/0x150 [ 426.591241] asm_exc_debug+0x19/0x30 [ 426.591241] RIP: 0010:__clear_user+0x47/0x70 [ 426.591242] Code: ff 0f 01 cb 48 89 d8 48 c1 eb 03 48 89 ef 83 e0 07 48 = 89 d9 48 85 c9 74 19 66 2e 0f 1f 84 00 00 00 00 00 48 c7 07 00 00 00 00 <48> 8= 3 c7 08 ff c9 75 f1 48 89 c1 85 c9 74 0a c6 07 00 48 ff c7 ff [ 426.591244] RSP: 0018:ffffc90000acfbe0 EFLAGS: 00040206 [ 426.591244] RAX: 0000000000000004 RBX: 00000000000001fc RCX: 00000000000001fc [ 426.591245] RDX: 1ffff11025cb51ba RSI: 0000000000000000 RDI: 000000000040401c [ 426.591245] RBP: 000000000040401c R08: 0000000000000000 R09: 0000000000000001 [ 426.591246] R10: fffffbfff1b4506d R11: 0000000000000001 R12: ffff888142192aa0 [ 426.591246] R13: 0000000000000000 R14: 000000000040401c R15: ffff88815a6cb800 [ 426.591247] [ 426.591247] load_elf_binary+0x27f8/0x3f10 [ 426.591247] ? elf_core_dump+0x2ce0/0x2ce0 [ 426.591248] ? ima_file_mprotect+0x380/0x380 [ 426.591248] bprm_execve+0x684/0x1580 [ 426.591249] ? copy_strings.isra.0+0x680/0x680 [ 426.591249] do_execveat_common+0x55d/0x730 [ 426.591249] ? bprm_execve+0x1580/0x1580 [ 426.591250] ? getname_flags.part.0+0x8e/0x450 [ 426.591250] __x64_sys_execve+0x8f/0xc0 [ 426.591250] do_syscall_64+0x33/0x40 [ 426.591251] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 426.591251] RIP: 0033:0x7fda303e21ab [ 426.591252] Code: Unable to access opcode bytes at RIP 0x7fda303e2181. [ 426.591252] RSP: 002b:00007ffee7238158 EFLAGS: 00000202 ORIG_RAX: 000000000000003b [ 426.591253] RAX: ffffffffffffffda RBX: 00000000021c23a0 RCX: 00007fda303e21ab [ 426.591254] RDX: 00007ffee7238680 RSI: 00000000021c2380 RDI: 00000000021c23a0 [ 426.591255] RBP: 00007ffee72381c0 R08: 00007ffee7238290 R09: 0000000000000000 [ 426.591255] R10: 0000000000401059 R11: 0000000000000202 R12: 00000000021c2380 [ 426.591256] R13: 00007ffee7238680 R14: 0000000000000001 R15: 0000000000000000 ``` In addition to the spin lock in pool->lock above, there's also a similar lockdep error report in _stp_ctl_ready_lock held by _stp_ctl_log_werr(). --=20 You are receiving this mail because: You are the assignee for the bug.=