public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/19345] New: RHEL 7.0 s390x crash in check.exp
@ 2015-12-08 18:57 dsmith at redhat dot com
  2015-12-08 19:34 ` [Bug runtime/19345] " dsmith at redhat dot com
  2015-12-09 15:19 ` dsmith at redhat dot com
  0 siblings, 2 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2015-12-08 18:57 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19345

            Bug ID: 19345
           Summary: RHEL 7.0 s390x crash in check.exp
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: dsmith at redhat dot com
  Target Milestone: ---

During some testing on RHEL 7.0, 7.1, and 7.2, I found that the check.exp test
case (which tests all the systemtap examples) causes a crash on RHEL 7.0
(3.10.0-123.el7.s390x). The same test case passes on 7.1 (3.10.0-229.el7.s390x)
and newer kernels.

The crash looks like:

====
[ 5232.627933] Unable to handle kernel pointer dereference at virtual kernel
address 000000f440a2e000
[ 5232.627984] Oops: 003b [#1] SMP 
[ 5232.627987] Modules linked in:
stap_e4e6b8a268df981fe9186b8096f3569c__19634(OF) binfmt_misc sg qeth_l2 vmur
nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c dasd_fba_mod dasd_eckd_mod
dasd_mod qeth lcs ctcm qdio fsm ccwgroup dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: stap_9903d024d134f28f45f6801901192f4d__19622]
[ 5232.628012] CPU: 1 PID: 748 Comm: systemd-journal Tainted: GF         
O--------------   3.10.0-123.el7.s390x #1
[ 5232.628016] task: 00000000359ca440 ti: 0000000036914000 task.ti:
0000000036914000
[ 5232.628019] Krnl PSW : 0704c00180000000 000000000026fc8c
(mem_cgroup_update_page_stat+0x3c/0xa0)
[ 5232.628030]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
EA:3\x0aKrnl GPRS: 0000000000936dd2 0000000000878640 000000f440a2e328
0000000000000000
[ 5232.628039]            0000000000000000 0000000001e68000 0000000034e5e160
0000000000000000
[ 5232.628050]            000003fffcd2c000 0000000000000200 000003d10004e7c0
0000000035338270
[ 5232.628051]            0000000000000000 0000000000000001 000000000026fc7c
0000000036917c18
[ 5232.628059] Krnl Code: 000000000026fc7c:
c010003044e2\x09larl\x09%r1,878640\x0a           000000000026fc82:
e33010540012\x09lt\x09%r3,84(%r1)\x0a          #000000000026fc88:
a774002a\x09\x09brc\x097,26fcdc\x0a          >000000000026fc8c:
e33020080002\x09ltg\x09%r3,8(%r2)\x0a           000000000026fc92:
a7840025\x09\x09brc\x098,26fcdc\x0a           000000000026fc96:
e31020070090\x09llgc\x09%r1,7(%r2)\x0a           000000000026fc9c:
a7110002\x09\x09tmll\x09%r1,2\x0a           000000000026fca0:
a784001e\x09\x09brc\x098,26fcdc
[ 5232.628071] Call Trace:
[ 5232.628072] ([<0000000000000200>] 0x200)
[ 5232.628074]  [<0000000000241ec0>] page_add_file_rmap+0xa0/0xd0
[ 5232.628078]  [<000000000023274a>] __do_fault+0x182/0x5f8
[ 5232.628081]  [<00000000002377ba>] handle_mm_fault+0x462/0xe98
[ 5232.628082]  [<00000000005b3998>] do_dat_exception+0x1d8/0x358
[ 5232.628087]  [<00000000005b1de6>] pgm_check_handler+0x17a/0x17e
[ 5232.628088]  [<000000008001db0c>] 0x8001db0c
[ 5232.628090] Last Breaking-Event-Address:
[ 5232.628090]  [<0000000000271f6a>] lookup_page_cgroup+0x42/0x48
[ 5232.628092]  
[ 5232.628093] Kernel panic - not syncing: Fatal exception: panic_on_oops
====

Here's another:

====
[ 4501.794419] Unable to handle kernel pointer dereference at virtual kernel
address 000000f440b1a000
[ 4501.794465] Oops: 003b [#1] SMP 
[ 4501.794467] Modules linked in:
stap_e4e6b8a268df981fe9186b8096f3569c__64377(OF) binfmt_misc sg qeth_l2 vmur
nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c dasd_fba_mod dasd_eckd_mod
dasd_mod lcs qeth ctcm fsm qdio ccwgroup dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: stap_9903d024d134f28f45f6801901192f4d__64365]
[ 4501.794493] CPU: 1 PID: 745 Comm: systemd-journal Tainted: GF         
O--------------   3.10.0-123.el7.s390x #1
[ 4501.794496] task: 000000000144f5d0 ti: 0000000034e14000 task.ti:
0000000034e14000
[ 4501.794499] Krnl PSW : 0404e00180000000 000000000026cea6
(mem_cgroup_page_lruvec+0x5e/0xc0)
[ 4501.794510]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0
EA:3\x0aKrnl GPRS: 0000000000936dd3 0000000000000048 000000f440b1a0a8
000000000000f440
[ 4501.794518]            0000000000000000 0000000001e68000 0000000000000000
00000000002156a0
[ 4501.794522]            0000000000000000 000000000279a0e0 0700000035be0000
000003d100c763c0
[ 4501.794531]            000003d100c763c0 00000000008d2c00 000000000026cea0
0000000034e179b0
[ 4501.794539] Krnl Code: 000000000026ce94:
f0a0000407f4\x09srp\x094(11,%r0),2036,0\x0a           000000000026ce9a:
c0e500002847\x09brasl\x09%r14,271f28\x0a          #000000000026cea0:
e310c0070090\x09llgc\x09%r1,7(%r12)\x0a          >000000000026cea6:
e33020080004\x09lg\x09%r3,8(%r2)\x0a           000000000026ceac:
a7110020\x09\x09tmll\x09%r1,32\x0a           000000000026ceb0:
a7740014\x09\x09brc\x097,26ced8\x0a           000000000026ceb4:
e31020070090\x09llgc\x09%r1,7(%r2)\x0a           000000000026ceba:
a7110002\x09\x09tmll\x09%r1,2
[ 4501.794551] Call Trace:
[ 4501.794552] ([<0000000034e17a98>] 0x34e17a98)
[ 4501.794554]  [<0000000000216af0>] pagevec_lru_move_fn+0xf8/0x1a8
[ 4501.794559]  [<0000000000216cae>] __lru_cache_add+0x9e/0xb8
[ 4501.794560]  [<000000000024b3c0>] read_swap_cache_async+0x108/0x1b8
[ 4501.794562]  [<000000000024b4fe>] swapin_readahead+0x8e/0xd8
[ 4501.794563]  [<0000000000222006>] shmem_getpage_gfp+0x5de/0x848
[ 4501.794565]  [<00000000002222da>] shmem_fault+0x6a/0x118
[ 4501.794567]  [<000000000023264a>] __do_fault+0x82/0x5f8
[ 4501.794570]  [<00000000002377ba>] handle_mm_fault+0x462/0xe98
[ 4501.794572]  [<00000000005b3998>] do_dat_exception+0x1d8/0x358
[ 4501.794576]  [<00000000005b1de6>] pgm_check_handler+0x17a/0x17e
[ 4501.794578]  [<000003fffd10c776>] 0x3fffd10c776
[ 4501.794579] Last Breaking-Event-Address:
[ 4501.794580]  [<0000000000271f6a>] lookup_page_cgroup+0x42/0x48
[ 4501.794582]  
[ 4501.794583] Kernel panic - not syncing: Fatal exception: panic_on_oops
====

When this crash happens, the last thing in systemtap.log is:

====
attempting command stap -w functioncallcount.stp "*@mm/*.c" -c "sleep 1"
====

Executing the same command by hand seems to cause the crash. This crash happens
every time the functioncallcount.stp example is run (it is not intermittent).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug runtime/19345] RHEL 7.0 s390x crash in check.exp
  2015-12-08 18:57 [Bug runtime/19345] New: RHEL 7.0 s390x crash in check.exp dsmith at redhat dot com
@ 2015-12-08 19:34 ` dsmith at redhat dot com
  2015-12-09 15:19 ` dsmith at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2015-12-08 19:34 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19345

David Smith <dsmith at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #1 from David Smith <dsmith at redhat dot com> ---
Looks like this was a s390x kprobes bug. The fix was first backported to
kernel-3.10.0-155.el7. The kernel crashed after it tried to emulate the "lgrl"
instruction.

The fix was composed of the following upstream kernel patches:

c802d64a356b5cf349121ac4c5e005f037ce548d
af96397de8600232effbff43dc8b4ca20ddc02b1
63c40436a1afc837f3ace6b5a39c547bc91c20bc

It doesn't look like there is a way for systemtap to work around this kernel
bug, so I'm going to close it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug runtime/19345] RHEL 7.0 s390x crash in check.exp
  2015-12-08 18:57 [Bug runtime/19345] New: RHEL 7.0 s390x crash in check.exp dsmith at redhat dot com
  2015-12-08 19:34 ` [Bug runtime/19345] " dsmith at redhat dot com
@ 2015-12-09 15:19 ` dsmith at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: dsmith at redhat dot com @ 2015-12-09 15:19 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=19345

David Smith <dsmith at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|WONTFIX                     |FIXED

--- Comment #2 from David Smith <dsmith at redhat dot com> ---
Added commit f27d869 to workaround the issue by not running the
functioncallcount.exp example on s390x kernels less than 3.11.

This isn't a fix, but at least the testsuite won't cause a s390x RHEL 7.0
system to crash here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-09 15:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 18:57 [Bug runtime/19345] New: RHEL 7.0 s390x crash in check.exp dsmith at redhat dot com
2015-12-08 19:34 ` [Bug runtime/19345] " dsmith at redhat dot com
2015-12-09 15:19 ` dsmith at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).