public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: "mcermak at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: systemtap@sourceware.org
Subject: [Bug runtime/31500] stapio exited with irqs disabled
Date: Tue, 19 Mar 2024 08:10:15 +0000	[thread overview]
Message-ID: <bug-31500-6586-Amt185Zapo@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-31500-6586@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=31500

--- Comment #2 from Martin Cermak <mcermak at redhat dot com> ---
(In reply to William Cohen from comment #1)
> This being encountered in during a regular "make installcheck"? Or are
> multiple runs requireed to trigger this?

Yup, for me this happens once or twice per a "make installcheck" run.

> Is this running inside a guest VM or bare hardware?
>
> How many cores does the machine running the tests have?
> 
> Does the problem go away when the number of available cores is reduced (or
> even limited to 1)?

It's my local VM.  Originally it had 2 cores, but I've lowered it down to 1
core and reproduced again:

[ 5286.549884] CR2: ffffffffaed64670
[ 5286.550344] ---[ end trace 0000000000000000 ]---
[ 5286.550792] RIP: 0010:arch_adjust_kprobe_addr+0x9/0x60
[ 5286.551245] Code: cc cc cc 48 89 de 48 89 ef 5b 5d e9 01 f9 ff ff 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <8b> 07 3d
66 0f 1f 00 74 24 25 ff ff ff fe 41 b8 0c f0 e1 05 41 f7
[ 5286.552147] RSP: 0018:ffffb41641d67cb0 EFLAGS: 00010282
[ 5286.552592] RAX: 0000000000000001 RBX: ffffffffaed64670 RCX:
ffffffffaed64670
[ 5286.553040] RDX: ffffb41641d67ce7 RSI: 0000000000000000 RDI:
ffffffffaed64670
[ 5286.553478] RBP: ffffb41641d67ce7 R08: 0000000000000000 R09:
ffffb41641d67cb8
[ 5286.553914] R10: 0000000000034132 R11: fffffffffd29b98f R12:
0000000000000000
[ 5286.554341] R13: ffffffffc09645f0 R14: 0000000000000000 R15:
ffffffffc09651c0
[ 5286.554764] FS:  00007f3bb1d6b180(0000) GS:ffff9083b8400000(0000)
knlGS:0000000000000000
[ 5286.555192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5286.555611] CR2: ffffffffaed64670 CR3: 000000001687e004 CR4:
0000000000370ef0
[ 5286.556034] DR0: 0000000000404010 DR1: 0000000000000000 DR2:
0000000000000000
[ 5286.556444] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000600
[ 5286.556862] note: stapio[128685] exited with irqs disabled
[10845.009756] clocksource: timekeeping watchdog on CPU0: kvm-clock retried 1
times before success

> I wonder if the "stapio[188000] exited with irqs disabled" is causing the
> "workqueue: drain_vmap_area_work hogged CPU for ..." messages.
> 
> It would also be good to fully understand the following in the dmesg output.
> What is CR2 (and CR3/CR4 etc.) referring to:
> 
> [16452.767193] CR2: ffffffffa64a65c0
> [16452.769368] ---[ end trace 0000000000000000 ]---
> [16452.771499] RIP: 0010:arch_adjust_kprobe_addr+0x41/0xe0
> [16452.773611] Code: 48 89 d3 48 ba 00 00 00 00 00 fc ff df 48 83 ec 08 0f
> b6 0c 11 48 89 fa 83 e2 07 83 c2 03 38 ca 7c 08 84 c9 0f 85 85 00 00 00 <8b>
> 55 00 81 fa 66 0f 1f 00 74 4f 81 e2 ff ff ff fe b9 0c f0 e1 05
> [16452.777850] RSP: 0018:ffffc90002027960 EFLAGS: 00010246
> [16452.779941] RAX: 0000000000000001 RBX: ffffc90002027a58 RCX:
> 0000000000000000
> [16452.782014] RDX: 0000000000000003 RSI: 0000000000000000 RDI:
> ffffffffa64a65c0
> [16452.784058] RBP: ffffffffa64a65c0 R08: fffffbfff4858cff R09:
> 0000000000000000
> [16452.786089] R10: 0000000000000000 R11: 0000000000000001 R12:
> 1ffff92000404f31
> [16452.788180] R13: ffffffffc10ca498 R14: 0000000000000000 R15:
> ffffffffc20ada98
> [16452.790166] FS:  00007f1eaa74a180(0000) GS:ffff888115400000(0000)
> knlGS:0000000000000000
> [16452.792138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16452.794077] CR2: ffffffffa64a65c0 CR3: 000000010e99e005 CR4:
> 0000000000370ef0
> [16452.796007] DR0: 0000000000404010 DR1: 0000000000000000 DR2:
> 0000000000000000
> [16452.797910] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000600
> [16452.799774] note: stapio[188000] exited with irqs disabled

"CR2" is mentioned twice in the new dmesg snippet I've added to this comment
above.  Looking up my dmesg, it's mentioned there once again earlier. It seems
to refer to a page fault:


[ 5286.379371] stap_3727aef4065cf8a32c15be31b4bdef0_128685 (poll_map.stp):
systemtap: 5.1/0.191, base: ffffffffc08cc000, memory:
3744data/72text/15ctx/16486net/2247alloc kb, probes: 80
[ 5286.479244] BUG: unable to handle page fault for address: ffffffffaed64670
[ 5286.479272] #PF: supervisor read access in kernel mode
[ 5286.479280] #PF: error_code(0x0000) - not-present page
[ 5286.479288] PGD 2d427067 P4D 2d427067 PUD 2d428063 PMD 1052c3063 PTE
800fffffd229b062
[ 5286.479302] Oops: 0000 [#1] PREEMPT SMP PTI
[ 5286.479310] CPU: 0 PID: 128685 Comm: stapio Tainted: G           OE    
-------  ---  6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 #1
[ 5286.479327] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.1-2.fc36 04/01/2014
[ 5286.479338] RIP: 0010:arch_adjust_kprobe_addr+0x9/0x60
[ 5286.479349] Code: cc cc cc 48 89 de 48 89 ef 5b 5d e9 01 f9 ff ff 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <8b> 07 3d
66 0f 1f 00 74 24 25 ff ff ff fe 41 b8 0c f0 e1 05 41 f7
[ 5286.479371] RSP: 0018:ffffb41641d67cb0 EFLAGS: 00010282
[ 5286.479379] RAX: 0000000000000001 RBX: ffffffffaed64670 RCX:
ffffffffaed64670
[ 5286.479389] RDX: ffffb41641d67ce7 RSI: 0000000000000000 RDI:
ffffffffaed64670
[ 5286.479399] RBP: ffffb41641d67ce7 R08: 0000000000000000 R09:
ffffb41641d67cb8
[ 5286.479408] R10: 0000000000034132 R11: fffffffffd29b98f R12:
0000000000000000
[ 5286.479418] R13: ffffffffc09645f0 R14: 0000000000000000 R15:
ffffffffc09651c0
[ 5286.479427] FS:  00007f3bb1d6b180(0000) GS:ffff9083b8400000(0000)
knlGS:0000000000000000
[ 5286.479438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5286.479446] CR2: ffffffffaed64670 CR3: 000000001687e004 CR4:
0000000000370ef0
[ 5286.479459] DR0: 0000000000404010 DR1: 0000000000000000 DR2:
0000000000000000
[ 5286.479471] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000600
[ 5286.479481] Call Trace:
[ 5286.479486]  <TASK>
[ 5286.479491]  ? __die+0x23/0x70
[ 5286.479498]  ? __pfx_vfs_caches_init_early+0x10/0x10
[ 5286.479507]  ? page_fault_oops+0x170/0x580
[ 5286.479515]  ? __pfx_vfs_caches_init_early+0x10/0x10
[ 5286.479524]  ? exc_page_fault+0x170/0x180
[ 5286.479532]  ? asm_exc_page_fault+0x26/0x30
[ 5286.479542]  ? __pfx_vfs_caches_init_early+0x10/0x10
[ 5286.479551]  ? __pfx_vfs_caches_init_early+0x10/0x10
[ 5286.479559]  ? __pfx_vfs_caches_init_early+0x10/0x10
[ 5286.479566]  ? arch_adjust_kprobe_addr+0x9/0x60
[ 5286.479575]  _kprobe_addr+0x64/0x90
[ 5286.479582]  register_kprobe+0x42/0x6a0
[ 5286.479590]  stapkp_register_probe+0x124/0x180
[stap_3727aef4065cf8a32c15be31b4bdef0_128685]
[ 5286.479610]  _stp_ctl_write_cmd+0x1037/0x2390
[stap_3727aef4065cf8a32c15be31b4bdef0_128685]
[ 5286.479629]  ? inode_security+0x22/0x60
[ 5286.479637]  ? selinux_file_permission+0x14f/0x180
[ 5286.479646]  proc_reg_write+0x59/0xa0
[ 5286.479653]  vfs_write+0xf5/0x460
[ 5286.479660]  ? __do_sys_clone3+0xe4/0x120
[ 5286.479669]  ? _copy_from_user+0x2f/0x70
[ 5286.479944]  ? __x64_sys_rt_sigprocmask+0xdb/0x150
[ 5286.480205]  ksys_write+0x6d/0xf0
[ 5286.480456]  do_syscall_64+0x85/0x170
[ 5286.480704]  ? syscall_exit_to_user_mode+0x69/0x220
[ 5286.480954]  ? do_syscall_64+0x94/0x170
[ 5286.481193]  ? exc_page_fault+0x7e/0x180
[ 5286.481432]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[ 5286.481669] RIP: 0033:0x7f3bb1e777fd
[ 5286.481918] Code: e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 08 6a
f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 b8 01 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 5f 6a f8 ff 48 8b
[ 5286.482399] RSP: 002b:00007ffc3dd21c60 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[ 5286.482640] RAX: ffffffffffffffda RBX: 0000000000000008 RCX:
00007f3bb1e777fd
[ 5286.482893] RDX: 000000000000000c RSI: 00007ffc3dd21c90 RDI:
0000000000000004
[ 5286.483126] RBP: 00007ffc3dd21c80 R08: 0000000000000000 R09:
00007ffc3dd20e57
[ 5286.483351] R10: 0000000000000008 R11: 0000000000000293 R12:
00007ffc3dd22120
[ 5286.483572] R13: 00007ffc3dd221a4 R14: 0000000000000001 R15:
00007ffc3dd22100
[ 5286.483793]  </TASK>
[ 5286.484014] Modules linked in:
stap_3727aef4065cf8a32c15be31b4bdef0_128685(OE)
stap_5f2af8997e31ed2aec4c86ba029f24e_16223(OE)
stap_09721f51216043267ec96771bff9046_14182(OE) intel_rapl_msr intel_rapl_common
intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel
snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg kvm snd_intel_sdw_acpi
snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm irqbypass
rapl iTCO_wdt intel_pmc_bxt iTCO_vendor_support rfkill snd_timer i2c_i801 snd
i2c_smbus lpc_ich soundcore virtio_net joydev virtio_balloon net_failover
failover sunrpc loop fuse nfnetlink zram crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
sha256_ssse3 virtio_console sha1_ssse3 virtio_gpu virtio_blk virtio_dma_buf
serio_raw qemu_fw_cfg


AI seems to be suggesting me that CR is a special purpose "control" register
typically holding the address that caused a page fault on x86_64.

-- 
You are receiving this mail because:
You are the assignee for the bug.

  parent reply	other threads:[~2024-03-19  8:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-16 13:03 [Bug runtime/31500] New: " mcermak at redhat dot com
2024-03-18 21:48 ` [Bug runtime/31500] " wcohen at redhat dot com
2024-03-19  8:10 ` mcermak at redhat dot com [this message]
2024-03-19 15:49 ` wcohen at redhat dot com
2024-03-19 20:57 ` wcohen at redhat dot com
2024-03-20 14:19 ` wcohen at redhat dot com
2024-03-20 15:25 ` wcohen at redhat dot com
2024-03-20 21:10 ` wcohen at redhat dot com
2024-03-21 14:31 ` wcohen at redhat dot com
2024-03-21 15:13 ` wcohen at redhat dot com
2024-03-22 14:40 ` wcohen at redhat dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31500-6586-Amt185Zapo@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).