From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 65D5E385697B; Mon, 30 Oct 2023 00:13:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 65D5E385697B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1698624789; bh=I/zgBHIzNqxpgz2mefrbwEV4iRo8a2onoBZfOClcIj4=; h=From:To:Subject:Date:From; b=wa76G8nR+33vJ0deADjEzHYMbB/hzN5zWw4UXAYPAtBmD3+mQ+OsG1YcaxcUD/Csr o9STENOQ24ZB4O+hDGHkjs4NXddqknKPZ46ZVqI7I8eUN1arIxFvEVyz0K9seOniWf gf4Z88eTeeXFdkzhKXtbRtQEQJDv+WIHSo4Yq/PY= From: "wcohen at redhat dot com" To: systemtap@sourceware.org Subject: [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le Date: Mon, 30 Oct 2023 00:13:08 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: runtime X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: wcohen at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31006 Bug ID: 31006 Summary: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le Product: systemtap Version: unspecified Status: NEW Severity: normal Priority: P2 Component: runtime Assignee: systemtap at sourceware dot org Reporter: wcohen at redhat dot com Target Milestone: --- When running the tests on various machine with the latest git checkout of systemtap I found that the following test was causing reboots on RHEL8 machines: make installcheck RUNTESTFLAGS=3D"systemtap.context/context.exp" The machine is running the following kernel: # rpm -q kernel kernel-4.18.0-372.78.1.el8_6.ppc64le See the tail end of the "make installcheck ..." test fail with: Running /root/systemtap_write/systemtap/testsuite/systemtap.context/context= .exp ... FAIL: early test shutdown (m1 eof) FAIL: print_syms of yyy_func2 (0) FAIL: backtrace of yyy_func3 (0) FAIL: print_syms of yyy_func3 (0) FAIL: backtrace of yyy_func4 (0) FAIL: print_syms of yyy_func4 (0) FAIL: print_syms didn't find systemtap_test_module1 (1) FAIL: print_syms didn't find [kernel] (0) FAIL: backtrace.stp didn't call exit (0) Then see following fault on the console and the machine reboots: [ 456.849576] systemtap_test_module2: loading out-of-tree module taints kernel.=20 [ 456.851419] systemtap_test_module2: module verification failed: signature and/or required key missing - tainting kernel=20 [ 492.972999] Kprobes globally unoptimized=20 [ 506.452744] Unable to handle kernel paging request for data at address 0xd00000000213fffc=20 [ 506.452756] Faulting instruction address: 0xc00000000005f60c=20 [ 506.452761] Oops: Kernel access of bad area, sig: 11 [#1]=20 [ 506.452764] LE SMP NR_CPUS=3D2048 NUMA pSeries=20 [ 506.452771] Modules linked in: stap_930f8f23a61453c40ca1c17d02dbb00b__7993(OE) systemtap_test_module1(OE) systemtap_test_module2(OE) bonding tls xts pseries_rng uio_pdrv_genirq uio vmx_crypto xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_s= rp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_0faf39736bc24bb3175d9e070d8b7d3e_7449]=20 [ 506.452801] CPU: 0 PID: 7993 Comm: stapio Kdump: loaded Tainted: G=20=20= =20=20=20=20=20=20=20=20 OE --------- - - 4.18.0-372.78.1.el8_6.ppc64le #1=20 [ 506.452808] NIP: c00000000005f60c LR: c0000000002e2a28 CTR: 0000000000000000=20 [ 506.452813] REGS: c000000005aa7790 TRAP: 0300 Tainted: G OE= =20=20=20 --------- - - (4.18.0-372.78.1.el8_6.ppc64le)=20 [ 506.452818] MSR: 800000000280b033 CR: 24008824 XER: 00000000=20 [ 506.452834] CFAR: c00000000005f604 DAR: d00000000213fffc DSISR: 40000000 IRQMASK: 0=20=20 [ 506.452834] GPR00: c0000000002e2a28 c000000005aa7a20 c000000002174900 d000000001faa4a0=20=20 [ 506.452834] GPR04: d000000002140000 d000000002140820 c0000000021a3c00 c000000007bac000=20=20 [ 506.452834] GPR08: c000000007bac000 d000000002140000 000000000000001f 0000000000000000=20=20 [ 506.452834] GPR12: 0000000000002000 c000000002e00000 0000000010015090 0000000010015398=20=20 [ 506.452834] GPR16: 00000000100153b8 00000000100153c0 0000000000000001 0000000000000000=20=20 [ 506.452834] GPR20: 0000000010014ef0 0000000000000000 c0000000021b2e88 d000000001fa9b18=20=20 [ 506.452834] GPR24: c000000001c70748 d000000002140680 d000000001faa4b0 c000000002493580=20=20 [ 506.452834] GPR28: c000000002019460 00000000000000ff d000000001faa4a0 000000007c0802a6=20=20 [ 506.452892] NIP [c00000000005f60c] arch_prepare_kprobe+0x4c/0x1f0=20 [ 506.452899] LR [c0000000002e2a28] register_kprobe+0x798/0x9b0=20 [ 506.452904] Call Trace:=20 [ 506.452907] [c000000005aa7a20] [c000000005aa7a60] 0xc000000005aa7a60 (unreliable)=20 [ 506.452913] [c000000005aa7aa0] [c0000000002e2a28] register_kprobe+0x798/0x9b0=20 [ 506.452919] [c000000005aa7b10] [c0000000002e2dc0] register_kretprobe.part.16+0xc0/0x3d0=20 [ 506.452926] [c000000005aa7b60] [c0000000002e3428] register_kretprobe+0x128/0x190=20 [ 506.452932] [c000000005aa7ba0] [d000000001f921c0] stapkp_register_probe+0xe8/0x1f0 [stap_930f8f23a61453c40ca1c17d02dbb00b__79= 93]=20 [ 506.452940] [c000000005aa7be0] [d000000001f9b2ec] systemtap_module_init+0x8d4/0xac0 [stap_930f8f23a61453c40ca1c17d02dbb00b__7= 993]=20 [ 506.452949] [c000000005aa7cc0] [d000000001f9b6d8] _stp_ctl_write_cmd+0x200/0x10f0 [stap_930f8f23a61453c40ca1c17d02dbb00b__799= 3]=20 [ 506.452958] [c000000005aa7d90] [c000000000659360] proc_reg_write+0x90/0x= 1a0=20 [ 506.452964] [c000000005aa7dc0] [c000000000573ae4] sys_write+0x134/0x3a0= =20 [ 506.452971] [c000000005aa7e20] [c00000000000b408] system_call+0x5c/0x70= =20 [ 506.452977] Instruction dump:=20 [ 506.452980] fbc1fff0 fbe1fff8 f8010010 f821ff81 3ba000ff e9230028 7c7e1b= 78 83e90000=20=20 [ 506.452991] 57ea36be 2f8a0001 409e0008 83a90004 <8149fffc> 7fe7fb78 554a= 36be 2b8a0001=20=20 [ 506.453003] ---[ end trace 21503c792de92f64 ]---=20 [ 506.460194]=20=20 [ 506.460201] Sending IPI to other CPUs=20 [ 506.461497] IPI complete=20 [ 506.482748] kexec: Starting switchover sequence.=20 I'm in purgatory=20 I attempted to replicate the problem using perf kprobe based events describ= ed by https://www.kernel.org/doc/html/v4.19/trace/kprobetrace.html on the test_module2 yyy. The perf probe was triggered events but did not crash the machine. The yyy function is at the very beginning of the module. There seems to be an access before the the start of module that is triggering the problem. # addr2line -ie /usr/lib/debug/lib/modules/4.18.0-372.78.1.el8_6.ppc64le/vmlinux 0xc00000000005f60c /usr/src/debug/kernel-4.18.0-372.78.1.el8_6/linux-4.18.0-372.78.1.el8_6.ppc= 64le/./arch/powerpc/include/asm/inst.h:54 /usr/src/debug/kernel-4.18.0-372.78.1.el8_6/linux-4.18.0-372.78.1.el8_6.ppc= 64le/arch/powerpc/kernel/kprobes.c:124 The code in arch_prepare_kprobe(struct kprobe *p) appears to be accessing before the beginning of the function: int arch_prepare_kprobe(struct kprobe *p) { int ret =3D 0; struct kprobe *prev; struct ppc_inst insn =3D ppc_inst_read((struct ppc_inst *)p->addr); struct ppc_inst prefix =3D ppc_inst_read((struct ppc_inst *)(p->add= r - 1)); //<--- I wasn't able to replicate this on rhel9 as the test does not currently com= pile because changes in a function that is called. --=20 You are receiving this mail because: You are the assignee for the bug.=