* [Bug runtime/29577] New: NULL pointer dereference in kernel tracepoint int3 execution
@ 2022-09-16 19:52 agentzh at gmail dot com
2022-09-16 20:49 ` [Bug runtime/29577] " agentzh at gmail dot com
2022-09-18 1:35 ` agentzh at gmail dot com
0 siblings, 2 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2022-09-16 19:52 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=29577
Bug ID: 29577
Summary: NULL pointer dereference in kernel tracepoint int3
execution
Product: systemtap
Version: unspecified
Status: UNCONFIRMED
Severity: critical
Priority: P2
Component: runtime
Assignee: systemtap at sourceware dot org
Reporter: agentzh at gmail dot com
Target Milestone: ---
Hi folks!
We observed several NULL pointer dereference errors in CentOS 8 kernel
`4.18.0-193.6.3.el8_2.x86_64`. They all look like this:
```
11948 [39391079.662703] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000008
11949 [39391094.375018] NMI watchdog: Watchdog detected hard LOCKUP on cpu
2Modules linked in: orxray_lj_c_off_cpu_2789022(OE) joydev binfmt_misc tcp_diag
inet_diag mpt3sas raid_class scsi_transport_sas 8021q garp mrp stp llc bonding
dell_rbu intel_rapl_msr intel_rapl_c 11949 ommon skx_edac nfit libnvdimm
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate
dell_smbios iTCO_wdt iTCO_vendor_support dell_wmi_descriptor wmi_bmof dcdbas
intel_uncore 11949 pcspkr intel_rapl_perf mei_me ipmi_si mei ipmi_devintf
i2c_i801 lpc_ich wmi ipmi_msghandler acpi_power_meter sch_fq tcp_bbr dummy
ip_tables xfs libcrc32c sd_mod sg mgag200 drm_vram_helper ttm drm_kms_helper
syscopyarea sysfillrect sysimgblt ahci fb_sys_fops libahc 11949 i drm
crc32c_intel libata megaraid_sas igb dca i2c_algo_bit i40e [last unloaded:
orxray_count_off_cpu_frames_2788869]
11950 [39391094.375028] CPU: 2 PID: 2789064 Comm: ps Kdump: loaded Tainted: G
OE --------- - - 4.18.0-193.6.3.el8_2.x86_64 #1
11951 [39391094.375029] Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS
2.6.4 04/09/2020
11952 [39391094.375029] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1d0
11953 [39391094.375030] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89
c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90
<8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
11954 [39391094.375030] RSP: 0018:ffffb2fdc90f7c68 EFLAGS: 00000002
11955 [39391094.375031] RAX: 0000000000000101 RBX: ffff88c60fce9540 RCX:
0000000000000003
11956 [39391094.375031] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff88c60fce9540
11957 [39391094.375031] RBP: 000000000000000b R08: 0000000000000008 R09:
0000000000333a33
11958 [39391094.375032] R10: 8080808080808080 R11: 0000000000000010 R12:
ffffb2fdc90f7ca8
11959 [39391094.375032] R13: 0000000000000040 R14: ffff88c3506a0180 R15:
0000000000000000
11960 [39391094.375032] FS: 00007f2f11c9d940(0000) GS:ffff88c60fc80000(0000)
knlGS:0000000000000000
11961 [39391094.375033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
11962 [39391094.375033] CR2: 000055fd3a4ce038 CR3: 0000000d7d370002 CR4:
00000000007606e0
11963 [39391094.375033] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
11964 [39391094.375034] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
11965 [39391094.375034] PKRU: 55555554
11966 [39391094.375034] Call Trace:
11967 [39391094.375034] _raw_spin_lock_irq+0x23/0x26
11968 [39391094.375034] wq_worker_comm+0x77/0xd0
11969 [39391094.375034] proc_task_name+0x44/0x120
11970 [39391094.375035] ? num_to_str+0xac/0xc0
11971 [39391094.375035] do_task_stat+0x434/0xde0
11972 [39391094.375035] proc_single_show+0x4e/0x90
11973 [39391094.375035] seq_read+0x153/0x410
11974 [39391094.375035] vfs_read+0x91/0x140
11975 [39391094.375035] ksys_read+0x4f/0xb0
11976 [39391094.375035] do_syscall_64+0x5b/0x1a0
11977 [39391094.375036] entry_SYSCALL_64_after_hwframe+0x65/0xca
11978 [39391094.375036] RIP: 0033:0x7f2f11049a85
11979 [39391094.375036] Code: fe ff ff 50 48 8d 3d 32 d8 09 00 e8 b5 0e 02 00
0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 d5 4b 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05
<48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
11980 [39391094.375036] RSP: 002b:00007ffff7ee8b78 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
11981 [39391094.375037] RAX: ffffffffffffffda RBX: 00007f2f11a7a678 RCX:
00007f2f11049a85
11982 [39391094.375037] RDX: 0000000000000800 RSI: 000055fd3a4cb5e0 RDI:
0000000000000006
11983 [39391094.375037] RBP: 00007f2f11a7a670 R08: 00007f2f11875ea5 R09:
00007f2f110dac80
11984 [39391094.375037] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000006
11985 [39391094.375038] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
11986 [39391094.375038] Kernel panic - not syncing: Hard LOCKUP
```
BTW, the orxray_XXX modules are stap ko modules.
Basically it is always a BUG for null pointer dereference followed by another
BUG for soft-lockup detected by the NMI watchdog.
Inspecting the kernel backtraces on all the CPUs in the kdump file reveals the
following full kernel backtraces for these two:
```
PID: 29 TASK: ffff88af005c0000 CPU: 3 COMMAND: "ksoftirqd/3"
#0 [fffffe0000088e50] crash_nmi_callback at ffffffff9984d033
#1 [fffffe0000088e58] nmi_handle at ffffffff99822893
#2 [fffffe0000088eb0] default_do_nmi at ffffffff99822d4e
#3 [fffffe0000088ed0] do_nmi at ffffffff99822f28
#4 [fffffe0000088ef0] end_repeat_nmi at ffffffff9a2016b4
[exception RIP: native_queued_spin_lock_slowpath+91]
RIP: ffffffff9990a92b RSP: ffffb2fdc683b910 RFLAGS: 00000002
RAX: 00000000005c0101 RBX: ffff88af00161ec0 RCX: 0000000000000003
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88c60fce9c40
RBP: ffff88c60fce9c40 R8: ffff88c60fce9560 R9: ffff88af07803230
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88af00162a7c
R13: 0000000000000000 R14: 0000000000000046 R15: 0000000000000003
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffb2fdc683b910] native_queued_spin_lock_slowpath at ffffffff9990a92b
#6 [ffffb2fdc683b910] _raw_spin_lock at ffffffff9a094f4c
#7 [ffffb2fdc683b918] try_to_wake_up at ffffffff998e1463
#8 [ffffb2fdc683b978] __queue_work at ffffffff998cdcb3
#9 [ffffb2fdc683b9c0] queue_work_on at ffffffff998cdfd4
#10 [ffffb2fdc683b9d0] mga_imageblit at ffffffffc0422666 [mgag200]
#11 [ffffb2fdc683b9e8] soft_cursor at ffffffff99c9cfa4
#12 [ffffb2fdc683ba40] bit_cursor at ffffffff99c9cbd2
#13 [ffffb2fdc683bb08] hide_cursor at ffffffff99d2d78a
#14 [ffffb2fdc683bb18] vt_console_print at ffffffff99d2fb8d
#15 [ffffb2fdc683bb80] console_unlock at ffffffff999171d9
#16 [ffffb2fdc683bbc0] vprintk_emit at ffffffff99918e24
#17 [ffffb2fdc683bc18] printk at ffffffff999193a6
#18 [ffffb2fdc683bc78] no_context at ffffffff99868880
#19 [ffffb2fdc683bcd0] do_page_fault at ffffffff99869262
#20 [ffffb2fdc683bd00] page_fault at ffffffff9a20120e
[exception RIP: poke_int3_handler+56]
RIP: ffffffff998266e8 RSP: ffffb2fdc683bdb0 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffb2fdc683bdd8 RCX: ffffffff9a200b87
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff9a08f7f6
RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#21 [ffffb2fdc683bdb8] do_int3 at ffffffff9981ed2e
#22 [ffffb2fdc683bdd0] int3 at ffffffff9a201124
#23 [ffffb2fdc683be58] __schedule at ffffffff9a08f7f7
#24 [ffffb2fdc683bed8] schedule at ffffffff9a08fd2f
#25 [ffffb2fdc683bee8] smpboot_thread_fn at ffffffff998d88cb
#26 [ffffb2fdc683bf10] kthread at ffffffff998d4812
#27 [ffffb2fdc683bf50] ret_from_fork at ffffffff9a20023f
```
This one corresponds to the dmesg error
```
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
```
By analyzing the PC value in the kernel backtrace, we found that it is crashed
in the poke_int3_handler() function in file arch/x86/kernel/alternative.c, on
line 832:
```
832 if (tp->addr != ip)
```
The offset of the `addr` in tp is indeed 8.
Also, we use gdb to find out `__schedule at ffffffff9a08f7f7` is for
kernel/sched/core.c:3431:
```
3431 trace_sched_switch(preempt, prev, next);
```
Our stap script indeed defines a `probe kernel.trace("sched_switch")`. So it
makes sense.
And there's another one for another CPU related to stap:
```
PID: 2789022 TASK: ffff88b461eadc40 CPU: 19 COMMAND: "stapio"
#0 [fffffe0000338e50] crash_nmi_callback at ffffffff9984d033
#1 [fffffe0000338e58] nmi_handle at ffffffff99822893
#2 [fffffe0000338eb0] default_do_nmi at ffffffff99822d4e
#3 [fffffe0000338ed0] do_nmi at ffffffff99822f28
#4 [fffffe0000338ef0] end_repeat_nmi at ffffffff9a2016b4
[exception RIP: smp_call_function_many+492]
RIP: ffffffff9994f2ac RSP: ffffb2fdc9b63d18 RFLAGS: 00000202
RAX: 0000000000000003 RBX: ffff88c6100eacc0 RCX: ffff88c60fcf04c0
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88af0059f1b0
RBP: ffffffff99826720 R8: 000000000002e040 R9: ffffffff9984ddda
R10: ffffe28c203a4600 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000020 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffb2fdc9b63d18] smp_call_function_many at ffffffff9994f2ac
#6 [ffffb2fdc9b63d58] on_each_cpu at ffffffff9994f368
#7 [ffffb2fdc9b63d78] text_poke_bp_batch at ffffffff998270b9
#8 [ffffb2fdc9b63db8] arch_jump_label_transform_apply at ffffffff9982433e
#9 [ffffb2fdc9b63dc0] __static_key_slow_dec_cpuslocked at ffffffff99a1464f
#10 [ffffb2fdc9b63de0] static_key_slow_dec at ffffffff99a146d3
#11 [ffffb2fdc9b63df0] tracepoint_probe_unregister at ffffffff99994452
#12 [ffffb2fdc9b63e20] do_tracepoint_probe_unregister at ffffffffc0c905b3
[orxray_lj_c_off_cpu_2789022]
#13 [ffffb2fdc9b63e28] stp_tracepoint_probe_unregister at ffffffffc0ca8147
[orxray_lj_c_off_cpu_2789022]
#14 [ffffb2fdc9b63e50] utrace_shutdown at ffffffffc0ca8440
[orxray_lj_c_off_cpu_2789022]
#15 [ffffb2fdc9b63e58] stap_stop_task_finder at ffffffffc0ca85ad
[orxray_lj_c_off_cpu_2789022]
#16 [ffffb2fdc9b63e60] systemtap_module_exit at ffffffffc0ca932d
[orxray_lj_c_off_cpu_2789022]
#17 [ffffb2fdc9b63e80] _stp_cleanup_and_exit at ffffffffc0ca956e
[orxray_lj_c_off_cpu_2789022]
#18 [ffffb2fdc9b63e90] _stp_ctl_write_cmd at ffffffffc0ca96d1
[orxray_lj_c_off_cpu_2789022]
#19 [ffffb2fdc9b63eb8] proc_reg_write at ffffffff99b4241c
#20 [ffffb2fdc9b63ed0] vfs_write at ffffffff99ac7835
#21 [ffffb2fdc9b63f00] ksys_write at ffffffff99ac7aaf
#22 [ffffb2fdc9b63f38] do_syscall_64 at ffffffff9980419b
#23 [ffffb2fdc9b63f50] entry_SYSCALL_64_after_hwframe at ffffffff9a2000ad
RIP: 00007f51f720faf7 RSP: 00007ffce4839fa0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f51f720faf7
RDX: 0000000000000004 RSI: 00007ffce4839fe8 RDI: 0000000000000005
RBP: 00007ffce4839fe8 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000004
R13: 000000000000000b R14: 00007f51f78356c0 R15: 00007ffce483a094
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
```
This is in the context of stapio while unregistering the module and the
tracepoint probes. But note that it is in utrace_shutdown(), which does not
involve the `sched_switch` tracepoint. So I'm not sure how these two contexts
are related to each other.
We tried hard to reproduce this on the same kernel in a controlled environment
but with no luck, even when the tracepoints are triggered 2M times per second
across 24 CPU cores, and the staprun command is repeated every second.
So far I only found the following upstream bug fix which may be related or may
not:
https://lore.kernel.org/all/20210805132717.23813-1-mathieu.desnoyers@efficios.com/
I have another theory is that maybe a previous run of a stap ko failed to
unregister its sched_switch tracepoint. The same tool was running periodically
in the same box after all.
Any help to debug this further will be highly recommended!
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug runtime/29577] NULL pointer dereference in kernel tracepoint int3 execution
2022-09-16 19:52 [Bug runtime/29577] New: NULL pointer dereference in kernel tracepoint int3 execution agentzh at gmail dot com
@ 2022-09-16 20:49 ` agentzh at gmail dot com
2022-09-18 1:35 ` agentzh at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2022-09-16 20:49 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=29577
--- Comment #1 from agentzh <agentzh at gmail dot com> ---
I just proposed a patch to fix it but haven't verified it since it's extremely
hard to reproduce:
https://gist.github.com/agentzh/905021679609619ec1b62011870cd926
It looks like related.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug runtime/29577] NULL pointer dereference in kernel tracepoint int3 execution
2022-09-16 19:52 [Bug runtime/29577] New: NULL pointer dereference in kernel tracepoint int3 execution agentzh at gmail dot com
2022-09-16 20:49 ` [Bug runtime/29577] " agentzh at gmail dot com
@ 2022-09-18 1:35 ` agentzh at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: agentzh at gmail dot com @ 2022-09-18 1:35 UTC (permalink / raw)
To: systemtap
https://sourceware.org/bugzilla/show_bug.cgi?id=29577
agentzh <agentzh at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #2 from agentzh <agentzh at gmail dot com> ---
OK, now after analyzing the kdump file in more detail, I believe it is this
kernel bug fixed upstream almost 3 years ago:
https://lkml.iu.edu/hypermail/linux/kernel/1912.0/00303.html
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-18 1:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-16 19:52 [Bug runtime/29577] New: NULL pointer dereference in kernel tracepoint int3 execution agentzh at gmail dot com
2022-09-16 20:49 ` [Bug runtime/29577] " agentzh at gmail dot com
2022-09-18 1:35 ` agentzh at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).