From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 174AF3858C5F; Tue, 19 Mar 2024 08:10:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 174AF3858C5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1710835816; bh=xCf9KIWVzgMRxhrwLSCngy2qr65n2HA50srV9korONY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=wEujT1kIkUZ+dhiuxP2uQ0NaK/MXVzN4VG/I+ZbqaK9elvN/GFXvk6VWXsa+GiUFC vqVh0vCNC4sIVgXNHgSINZ4boE+zFK2LPqAtwp4TWz1v7PaZy5qi3xzWHCf8a9h/I1 g81MYAr/0PnmJcd77OYpmuw+QyUODaW0Hc9OXSFY= From: "mcermak at redhat dot com" To: systemtap@sourceware.org Subject: [Bug runtime/31500] stapio exited with irqs disabled Date: Tue, 19 Mar 2024 08:10:15 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: runtime X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: mcermak at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31500 --- Comment #2 from Martin Cermak --- (In reply to William Cohen from comment #1) > This being encountered in during a regular "make installcheck"? Or are > multiple runs requireed to trigger this? Yup, for me this happens once or twice per a "make installcheck" run. > Is this running inside a guest VM or bare hardware? > > How many cores does the machine running the tests have? >=20 > Does the problem go away when the number of available cores is reduced (or > even limited to 1)? It's my local VM. Originally it had 2 cores, but I've lowered it down to 1 core and reproduced again: [ 5286.549884] CR2: ffffffffaed64670 [ 5286.550344] ---[ end trace 0000000000000000 ]--- [ 5286.550792] RIP: 0010:arch_adjust_kprobe_addr+0x9/0x60 [ 5286.551245] Code: cc cc cc 48 89 de 48 89 ef 5b 5d e9 01 f9 ff ff 90 90 = 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <8b> 0= 7 3d 66 0f 1f 00 74 24 25 ff ff ff fe 41 b8 0c f0 e1 05 41 f7 [ 5286.552147] RSP: 0018:ffffb41641d67cb0 EFLAGS: 00010282 [ 5286.552592] RAX: 0000000000000001 RBX: ffffffffaed64670 RCX: ffffffffaed64670 [ 5286.553040] RDX: ffffb41641d67ce7 RSI: 0000000000000000 RDI: ffffffffaed64670 [ 5286.553478] RBP: ffffb41641d67ce7 R08: 0000000000000000 R09: ffffb41641d67cb8 [ 5286.553914] R10: 0000000000034132 R11: fffffffffd29b98f R12: 0000000000000000 [ 5286.554341] R13: ffffffffc09645f0 R14: 0000000000000000 R15: ffffffffc09651c0 [ 5286.554764] FS: 00007f3bb1d6b180(0000) GS:ffff9083b8400000(0000) knlGS:0000000000000000 [ 5286.555192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5286.555611] CR2: ffffffffaed64670 CR3: 000000001687e004 CR4: 0000000000370ef0 [ 5286.556034] DR0: 0000000000404010 DR1: 0000000000000000 DR2: 0000000000000000 [ 5286.556444] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 [ 5286.556862] note: stapio[128685] exited with irqs disabled [10845.009756] clocksource: timekeeping watchdog on CPU0: kvm-clock retried= 1 times before success > I wonder if the "stapio[188000] exited with irqs disabled" is causing the > "workqueue: drain_vmap_area_work hogged CPU for ..." messages. >=20 > It would also be good to fully understand the following in the dmesg outp= ut. > What is CR2 (and CR3/CR4 etc.) referring to: >=20 > [16452.767193] CR2: ffffffffa64a65c0 > [16452.769368] ---[ end trace 0000000000000000 ]--- > [16452.771499] RIP: 0010:arch_adjust_kprobe_addr+0x41/0xe0 > [16452.773611] Code: 48 89 d3 48 ba 00 00 00 00 00 fc ff df 48 83 ec 08 0f > b6 0c 11 48 89 fa 83 e2 07 83 c2 03 38 ca 7c 08 84 c9 0f 85 85 00 00 00 <= 8b> > 55 00 81 fa 66 0f 1f 00 74 4f 81 e2 ff ff ff fe b9 0c f0 e1 05 > [16452.777850] RSP: 0018:ffffc90002027960 EFLAGS: 00010246 > [16452.779941] RAX: 0000000000000001 RBX: ffffc90002027a58 RCX: > 0000000000000000 > [16452.782014] RDX: 0000000000000003 RSI: 0000000000000000 RDI: > ffffffffa64a65c0 > [16452.784058] RBP: ffffffffa64a65c0 R08: fffffbfff4858cff R09: > 0000000000000000 > [16452.786089] R10: 0000000000000000 R11: 0000000000000001 R12: > 1ffff92000404f31 > [16452.788180] R13: ffffffffc10ca498 R14: 0000000000000000 R15: > ffffffffc20ada98 > [16452.790166] FS: 00007f1eaa74a180(0000) GS:ffff888115400000(0000) > knlGS:0000000000000000 > [16452.792138] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [16452.794077] CR2: ffffffffa64a65c0 CR3: 000000010e99e005 CR4: > 0000000000370ef0 > [16452.796007] DR0: 0000000000404010 DR1: 0000000000000000 DR2: > 0000000000000000 > [16452.797910] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000600 > [16452.799774] note: stapio[188000] exited with irqs disabled "CR2" is mentioned twice in the new dmesg snippet I've added to this comment above. Looking up my dmesg, it's mentioned there once again earlier. It se= ems to refer to a page fault: [ 5286.379371] stap_3727aef4065cf8a32c15be31b4bdef0_128685 (poll_map.stp): systemtap: 5.1/0.191, base: ffffffffc08cc000, memory: 3744data/72text/15ctx/16486net/2247alloc kb, probes: 80 [ 5286.479244] BUG: unable to handle page fault for address: ffffffffaed646= 70 [ 5286.479272] #PF: supervisor read access in kernel mode [ 5286.479280] #PF: error_code(0x0000) - not-present page [ 5286.479288] PGD 2d427067 P4D 2d427067 PUD 2d428063 PMD 1052c3063 PTE 800fffffd229b062 [ 5286.479302] Oops: 0000 [#1] PREEMPT SMP PTI [ 5286.479310] CPU: 0 PID: 128685 Comm: stapio Tainted: G OE=20= =20=20=20 ------- --- 6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 #1 [ 5286.479327] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc36 04/01/2014 [ 5286.479338] RIP: 0010:arch_adjust_kprobe_addr+0x9/0x60 [ 5286.479349] Code: cc cc cc 48 89 de 48 89 ef 5b 5d e9 01 f9 ff ff 90 90 = 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <8b> 0= 7 3d 66 0f 1f 00 74 24 25 ff ff ff fe 41 b8 0c f0 e1 05 41 f7 [ 5286.479371] RSP: 0018:ffffb41641d67cb0 EFLAGS: 00010282 [ 5286.479379] RAX: 0000000000000001 RBX: ffffffffaed64670 RCX: ffffffffaed64670 [ 5286.479389] RDX: ffffb41641d67ce7 RSI: 0000000000000000 RDI: ffffffffaed64670 [ 5286.479399] RBP: ffffb41641d67ce7 R08: 0000000000000000 R09: ffffb41641d67cb8 [ 5286.479408] R10: 0000000000034132 R11: fffffffffd29b98f R12: 0000000000000000 [ 5286.479418] R13: ffffffffc09645f0 R14: 0000000000000000 R15: ffffffffc09651c0 [ 5286.479427] FS: 00007f3bb1d6b180(0000) GS:ffff9083b8400000(0000) knlGS:0000000000000000 [ 5286.479438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5286.479446] CR2: ffffffffaed64670 CR3: 000000001687e004 CR4: 0000000000370ef0 [ 5286.479459] DR0: 0000000000404010 DR1: 0000000000000000 DR2: 0000000000000000 [ 5286.479471] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 [ 5286.479481] Call Trace: [ 5286.479486] [ 5286.479491] ? __die+0x23/0x70 [ 5286.479498] ? __pfx_vfs_caches_init_early+0x10/0x10 [ 5286.479507] ? page_fault_oops+0x170/0x580 [ 5286.479515] ? __pfx_vfs_caches_init_early+0x10/0x10 [ 5286.479524] ? exc_page_fault+0x170/0x180 [ 5286.479532] ? asm_exc_page_fault+0x26/0x30 [ 5286.479542] ? __pfx_vfs_caches_init_early+0x10/0x10 [ 5286.479551] ? __pfx_vfs_caches_init_early+0x10/0x10 [ 5286.479559] ? __pfx_vfs_caches_init_early+0x10/0x10 [ 5286.479566] ? arch_adjust_kprobe_addr+0x9/0x60 [ 5286.479575] _kprobe_addr+0x64/0x90 [ 5286.479582] register_kprobe+0x42/0x6a0 [ 5286.479590] stapkp_register_probe+0x124/0x180 [stap_3727aef4065cf8a32c15be31b4bdef0_128685] [ 5286.479610] _stp_ctl_write_cmd+0x1037/0x2390 [stap_3727aef4065cf8a32c15be31b4bdef0_128685] [ 5286.479629] ? inode_security+0x22/0x60 [ 5286.479637] ? selinux_file_permission+0x14f/0x180 [ 5286.479646] proc_reg_write+0x59/0xa0 [ 5286.479653] vfs_write+0xf5/0x460 [ 5286.479660] ? __do_sys_clone3+0xe4/0x120 [ 5286.479669] ? _copy_from_user+0x2f/0x70 [ 5286.479944] ? __x64_sys_rt_sigprocmask+0xdb/0x150 [ 5286.480205] ksys_write+0x6d/0xf0 [ 5286.480456] do_syscall_64+0x85/0x170 [ 5286.480704] ? syscall_exit_to_user_mode+0x69/0x220 [ 5286.480954] ? do_syscall_64+0x94/0x170 [ 5286.481193] ? exc_page_fault+0x7e/0x180 [ 5286.481432] entry_SYSCALL_64_after_hwframe+0x6c/0x74 [ 5286.481669] RIP: 0033:0x7f3bb1e777fd [ 5286.481918] Code: e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 08 = 6a f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 b8 01 00 00 00 0f 05 <48> 3= d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 5f 6a f8 ff 48 8b [ 5286.482399] RSP: 002b:00007ffc3dd21c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 5286.482640] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f3bb1e777fd [ 5286.482893] RDX: 000000000000000c RSI: 00007ffc3dd21c90 RDI: 0000000000000004 [ 5286.483126] RBP: 00007ffc3dd21c80 R08: 0000000000000000 R09: 00007ffc3dd20e57 [ 5286.483351] R10: 0000000000000008 R11: 0000000000000293 R12: 00007ffc3dd22120 [ 5286.483572] R13: 00007ffc3dd221a4 R14: 0000000000000001 R15: 00007ffc3dd22100 [ 5286.483793] [ 5286.484014] Modules linked in: stap_3727aef4065cf8a32c15be31b4bdef0_128685(OE) stap_5f2af8997e31ed2aec4c86ba029f24e_16223(OE) stap_09721f51216043267ec96771bff9046_14182(OE) intel_rapl_msr intel_rapl_co= mmon intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg kvm snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm irqbypa= ss rapl iTCO_wdt intel_pmc_bxt iTCO_vendor_support rfkill snd_timer i2c_i801 s= nd i2c_smbus lpc_ich soundcore virtio_net joydev virtio_balloon net_failover failover sunrpc loop fuse nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_sss= e3 sha256_ssse3 virtio_console sha1_ssse3 virtio_gpu virtio_blk virtio_dma_buf serio_raw qemu_fw_cfg AI seems to be suggesting me that CR is a special purpose "control" register typically holding the address that caused a page fault on x86_64. --=20 You are receiving this mail because: You are the assignee for the bug.=