public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le
@ 2023-10-30  0:13 wcohen at redhat dot com
  2023-10-30  0:37 ` [Bug runtime/31006] " wcohen at redhat dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: wcohen at redhat dot com @ 2023-10-30  0:13 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=31006

            Bug ID: 31006
           Summary: systemtap.contex/context.exp test causes reboot on
                    RHEL8 ppc64le
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: wcohen at redhat dot com
  Target Milestone: ---

When running the tests on various machine with the latest git checkout of
systemtap I found that the following test was causing reboots on RHEL8
machines:

  make installcheck RUNTESTFLAGS="systemtap.context/context.exp"

The machine is running the following kernel:
# rpm -q kernel
kernel-4.18.0-372.78.1.el8_6.ppc64le

See the tail end of the "make installcheck ..." test fail with:

Running /root/systemtap_write/systemtap/testsuite/systemtap.context/context.exp
...
FAIL: early test shutdown (m1 eof)
FAIL: print_syms of yyy_func2 (0)
FAIL: backtrace of yyy_func3 (0)
FAIL: print_syms of yyy_func3 (0)
FAIL: backtrace of yyy_func4 (0)
FAIL: print_syms of yyy_func4 (0)
FAIL: print_syms didn't find systemtap_test_module1 (1)
FAIL: print_syms didn't find [kernel] (0)
FAIL: backtrace.stp didn't call exit (0)


Then see following fault on the console and the machine reboots:


[  456.849576] systemtap_test_module2: loading out-of-tree module taints
kernel. 
[  456.851419] systemtap_test_module2: module verification failed: signature
and/or required key missing - tainting kernel 
[  492.972999] Kprobes globally unoptimized 
[  506.452744] Unable to handle kernel paging request for data at address
0xd00000000213fffc 
[  506.452756] Faulting instruction address: 0xc00000000005f60c 
[  506.452761] Oops: Kernel access of bad area, sig: 11 [#1] 
[  506.452764] LE SMP NR_CPUS=2048 NUMA pSeries 
[  506.452771] Modules linked in:
stap_930f8f23a61453c40ca1c17d02dbb00b__7993(OE) systemtap_test_module1(OE)
systemtap_test_module2(OE) bonding tls xts pseries_rng uio_pdrv_genirq uio
vmx_crypto xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp
dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
stap_0faf39736bc24bb3175d9e070d8b7d3e_7449] 
[  506.452801] CPU: 0 PID: 7993 Comm: stapio Kdump: loaded Tainted: G          
OE    --------- -  - 4.18.0-372.78.1.el8_6.ppc64le #1 
[  506.452808] NIP:  c00000000005f60c LR: c0000000002e2a28 CTR:
0000000000000000 
[  506.452813] REGS: c000000005aa7790 TRAP: 0300   Tainted: G           OE   
--------- -  -  (4.18.0-372.78.1.el8_6.ppc64le) 
[  506.452818] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
24008824  XER: 00000000 
[  506.452834] CFAR: c00000000005f604 DAR: d00000000213fffc DSISR: 40000000
IRQMASK: 0  
[  506.452834] GPR00: c0000000002e2a28 c000000005aa7a20 c000000002174900
d000000001faa4a0  
[  506.452834] GPR04: d000000002140000 d000000002140820 c0000000021a3c00
c000000007bac000  
[  506.452834] GPR08: c000000007bac000 d000000002140000 000000000000001f
0000000000000000  
[  506.452834] GPR12: 0000000000002000 c000000002e00000 0000000010015090
0000000010015398  
[  506.452834] GPR16: 00000000100153b8 00000000100153c0 0000000000000001
0000000000000000  
[  506.452834] GPR20: 0000000010014ef0 0000000000000000 c0000000021b2e88
d000000001fa9b18  
[  506.452834] GPR24: c000000001c70748 d000000002140680 d000000001faa4b0
c000000002493580  
[  506.452834] GPR28: c000000002019460 00000000000000ff d000000001faa4a0
000000007c0802a6  
[  506.452892] NIP [c00000000005f60c] arch_prepare_kprobe+0x4c/0x1f0 
[  506.452899] LR [c0000000002e2a28] register_kprobe+0x798/0x9b0 
[  506.452904] Call Trace: 
[  506.452907] [c000000005aa7a20] [c000000005aa7a60] 0xc000000005aa7a60
(unreliable) 
[  506.452913] [c000000005aa7aa0] [c0000000002e2a28]
register_kprobe+0x798/0x9b0 
[  506.452919] [c000000005aa7b10] [c0000000002e2dc0]
register_kretprobe.part.16+0xc0/0x3d0 
[  506.452926] [c000000005aa7b60] [c0000000002e3428]
register_kretprobe+0x128/0x190 
[  506.452932] [c000000005aa7ba0] [d000000001f921c0]
stapkp_register_probe+0xe8/0x1f0 [stap_930f8f23a61453c40ca1c17d02dbb00b__7993] 
[  506.452940] [c000000005aa7be0] [d000000001f9b2ec]
systemtap_module_init+0x8d4/0xac0 [stap_930f8f23a61453c40ca1c17d02dbb00b__7993] 
[  506.452949] [c000000005aa7cc0] [d000000001f9b6d8]
_stp_ctl_write_cmd+0x200/0x10f0 [stap_930f8f23a61453c40ca1c17d02dbb00b__7993] 
[  506.452958] [c000000005aa7d90] [c000000000659360] proc_reg_write+0x90/0x1a0 
[  506.452964] [c000000005aa7dc0] [c000000000573ae4] sys_write+0x134/0x3a0 
[  506.452971] [c000000005aa7e20] [c00000000000b408] system_call+0x5c/0x70 
[  506.452977] Instruction dump: 
[  506.452980] fbc1fff0 fbe1fff8 f8010010 f821ff81 3ba000ff e9230028 7c7e1b78
83e90000  
[  506.452991] 57ea36be 2f8a0001 409e0008 83a90004 <8149fffc> 7fe7fb78 554a36be
2b8a0001  
[  506.453003] ---[ end trace 21503c792de92f64 ]--- 
[  506.460194]  
[  506.460201] Sending IPI to other CPUs 
[  506.461497] IPI complete 
[  506.482748] kexec: Starting switchover sequence. 
I'm in purgatory 


I attempted to replicate the problem using perf kprobe based events described
by https://www.kernel.org/doc/html/v4.19/trace/kprobetrace.html on the
test_module2 yyy.
The perf probe was triggered events but did not crash the machine.  The yyy
function is at the very beginning of the module. There seems to be an access
before the the start of module that is triggering the problem.

#  addr2line -ie
/usr/lib/debug/lib/modules/4.18.0-372.78.1.el8_6.ppc64le/vmlinux
0xc00000000005f60c
/usr/src/debug/kernel-4.18.0-372.78.1.el8_6/linux-4.18.0-372.78.1.el8_6.ppc64le/./arch/powerpc/include/asm/inst.h:54
/usr/src/debug/kernel-4.18.0-372.78.1.el8_6/linux-4.18.0-372.78.1.el8_6.ppc64le/arch/powerpc/kernel/kprobes.c:124

The code in arch_prepare_kprobe(struct kprobe *p) appears to be accessing
before the beginning of the function:

int arch_prepare_kprobe(struct kprobe *p)
{
        int ret = 0;
        struct kprobe *prev;
        struct ppc_inst insn = ppc_inst_read((struct ppc_inst *)p->addr);
        struct ppc_inst prefix = ppc_inst_read((struct ppc_inst *)(p->addr -
1)); //<---


I wasn't able to replicate this on rhel9 as the test does not currently compile
because changes in a function that is called.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug runtime/31006] systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le
  2023-10-30  0:13 [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le wcohen at redhat dot com
@ 2023-10-30  0:37 ` wcohen at redhat dot com
  2023-10-30  1:32 ` wcohen at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: wcohen at redhat dot com @ 2023-10-30  0:37 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=31006

--- Comment #1 from William Cohen <wcohen at redhat dot com> ---
Looking at the /sys/kernel/debug/tracing/trace I see why now it didn' trigger
the problem as the probe was put on yyy_int+0x4 rather than yyy_int+0x0 as
systemtap is doing:

echo 'p:myprobe systemtap_test_module2:yyy_int $comm' >
/sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo  1 > /proc/stap_test_cmd 
echo  1 > /proc/stap_test_cmd

# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
            bash-1976  [006] ....  1525.064248: myprobe: (yyy_int+0x4/0x20
[systemtap_test_module2]) arg1="bash"
            bash-1976  [006] ....  1617.000845: myprobe: (yyy_int+0x4/0x20
[systemtap_test_module2]) arg1="bash"

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug runtime/31006] systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le
  2023-10-30  0:13 [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le wcohen at redhat dot com
  2023-10-30  0:37 ` [Bug runtime/31006] " wcohen at redhat dot com
@ 2023-10-30  1:32 ` wcohen at redhat dot com
  2023-10-30  1:33 ` wcohen at redhat dot com
  2023-10-30  7:37 ` fche at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: wcohen at redhat dot com @ 2023-10-30  1:32 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=31006

William Cohen <wcohen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from William Cohen <wcohen at redhat dot com> ---
Did some digging around and found that this has been fixed in the
kernel-4.18.0-520.el8.ppc64le kernel.  With that kernel the test just fails
rather than causing the system to reboot:

unning /usr/share/systemtap/testsuite/systemtap.context/context.exp ...
FAIL: backtrace - could not build modules
FAIL: args - could not build modules
FAIL: pid - could not build modules
FAIL: num_args - could not build modules
FAIL: symfileline - could not build modules

                === systemtap Summary ===

# of unexpected failures        5

Looking through the red hat issue looks like there might also be some RHEL9
kernels version that get similar oops (RHELPLAN-142467) and appears to be fixed
in RHEL9.2.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug runtime/31006] systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le
  2023-10-30  0:13 [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le wcohen at redhat dot com
  2023-10-30  0:37 ` [Bug runtime/31006] " wcohen at redhat dot com
  2023-10-30  1:32 ` wcohen at redhat dot com
@ 2023-10-30  1:33 ` wcohen at redhat dot com
  2023-10-30  7:37 ` fche at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: wcohen at redhat dot com @ 2023-10-30  1:33 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=31006

--- Comment #3 from William Cohen <wcohen at redhat dot com> ---
RHELPLAN-163851 is the RHEL issue that describe the RHEL8.9 kernel issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug runtime/31006] systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le
  2023-10-30  0:13 [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le wcohen at redhat dot com
                   ` (2 preceding siblings ...)
  2023-10-30  1:33 ` wcohen at redhat dot com
@ 2023-10-30  7:37 ` fche at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: fche at redhat dot com @ 2023-10-30  7:37 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=31006

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fche at redhat dot com
         Resolution|FIXED                       |WORKSFORME

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> ---
(tweaking status field to indicate that it was not a systemtap bug)

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-30  7:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-30  0:13 [Bug runtime/31006] New: systemtap.contex/context.exp test causes reboot on RHEL8 ppc64le wcohen at redhat dot com
2023-10-30  0:37 ` [Bug runtime/31006] " wcohen at redhat dot com
2023-10-30  1:32 ` wcohen at redhat dot com
2023-10-30  1:33 ` wcohen at redhat dot com
2023-10-30  7:37 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).