public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64
@ 2014-07-08 14:33 dsmith at redhat dot com
  2014-07-08 19:56 ` [Bug runtime/17126] " dsmith at redhat dot com
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-07-08 14:33 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

            Bug ID: 17126
           Summary: tracepoints.exp testcase causing stalls/hang on ppc64
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: dsmith at redhat dot com
            Target: ppc64

When I run the tracepoints.exp testcase on RHEL7 ppc64 (3.10.0-123.el7.ppc64),
the testcase never finishes and can't be interrupted. The console output
reports several processor stalls:

====
[  517.597905] stap_1fc11586ceeef8b8c42a1c32930268cd_4_3498: module
verification failed: signature and/or required key missing - tainting kernel
[  577.656336] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by
17, t=6003 jiffies, g=614, c=613, q=0)
[  577.656360] Task dump for CPU 0:
[  577.656364] swapper/0       R  running task        0     0      0 0x00000000
[  577.656372] Call Trace:
[  577.656378] [c000000001273820] [0000007e43fbfed6] 0x7e43fbfed6 (unreliable)
[  577.656389] [c0000000012739f0] [c000000001273aa0]
init_thread_union+0x3aa0/0x4000
[  577.656396] [c000000001273a80] [c000000001273b10]
init_thread_union+0x3b10/0x4000
[  637.676283] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16}
(detected by 17, t=6002 jiffies, g=615, c=614, q=0)
[  637.676306] Task dump for CPU 7:
[  637.676311] swapper/7       R  running task        0     0      1 0x00000804
[  637.676318] Call Trace:
[  637.676323] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[  637.676331] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[  637.676336] Task dump for CPU 12:
[  637.676340] swapper/12      R  running task        0     0      1 0x00000800
[  637.676347] Call Trace:
[  637.676351] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[  637.676359] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[  637.676364] Task dump for CPU 13:
[  637.676367] swapper/13      R  running task        0     0      1 0x00000800
[  637.676374] Call Trace:
[  637.676379] [c0000003ccbdb870] [c00000000011b63c]
.find_busiest_group+0x2ec/0x930 (unreliable)
[  637.676387] [c0000003ccbdba40] [0000000000000c00] 0xc00
[  637.676392] Task dump for CPU 16:
[  637.676395] stapio          R  running task        0  3498   3497 0x00008084
[  637.676402] Call Trace:
[  637.676406] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00
(unreliable)
[  637.676415] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[  817.746133] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16}
(detected by 17, t=24008 jiffies, g=615, c=614, q=0)
[  817.746157] Task dump for CPU 7:
[  817.746161] swapper/7       R  running task        0     0      1 0x00000804
[  817.746169] Call Trace:
[  817.746174] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[  817.746182] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[  817.746187] Task dump for CPU 12:
[  817.746191] swapper/12      R  running task        0     0      1 0x00000800
[  817.746197] Call Trace:
[  817.746201] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[  817.746209] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[  817.746214] Task dump for CPU 13:
[  817.746217] swapper/13      R  running task        0     0      1 0x00000800
[  817.746224] Call Trace:
[  817.746229] [c0000003ccbdb870] [c00000000011b63c]
.find_busiest_group+0x2ec/0x930 (unreliable)
[  817.746237] [c0000003ccbdba40] [0000000000000c00] 0xc00
[  817.746242] Task dump for CPU 16:
[  817.746245] stapio          R  running task        0  3498   3497 0x00008084
[  817.746252] Call Trace:
[  817.746256] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00
(unreliable)
[  817.746264] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[  997.785985] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16}
(detected by 15, t=42013 jiffies, g=615, c=614, q=0)
[  997.786018] Task dump for CPU 7:
[  997.786024] swapper/7       R  running task        0     0      1 0x00000804
[  997.786034] Call Trace:
[  997.786043] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[  997.786053] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[  997.786060] Task dump for CPU 12:
[  997.786065] swapper/12      R  running task        0     0      1 0x00000800
[  997.786074] Call Trace:
[  997.786079] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[  997.786089] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[  997.786095] Task dump for CPU 13:
[  997.786100] swapper/13      R  running task        0     0      1 0x00000804
[  997.786109] Call Trace:
[  997.786117] [c0000003ccbdb870] [c00000000011b63c]
.find_busiest_group+0x2ec/0x930 (unreliable)
[  997.786130] [c0000003ccbdba40] [0000000000000c00] 0xc00
[  997.786137] Task dump for CPU 16:
[  997.786141] stapio          R  running task        0  3498   3497 0x00008084
[  997.786150] Call Trace:
[  997.786155] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00
(unreliable)
[  997.786166] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[ 1177.835828] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16}
(detected by 17, t=60018 jiffies, g=615, c=614, q=0)
[ 1177.835853] Task dump for CPU 7:
[ 1177.835857] swapper/7       R  running task        0     0      1 0x00000804
[ 1177.835865] Call Trace:
[ 1177.835870] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[ 1177.835878] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[ 1177.835883] Task dump for CPU 12:
[ 1177.835887] swapper/12      R  running task        0     0      1 0x00000800
[ 1177.835893] Call Trace:
[ 1177.835897] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[ 1177.835905] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[ 1177.835909] Task dump for CPU 13:
[ 1177.835913] swapper/13      R  running task        0     0      1 0x00000804
[ 1177.835919] Call Trace:
[ 1177.835924] [c0000003ccbdb870] [c00000000011b63c]
.find_busiest_group+0x2ec/0x930 (unreliable)
[ 1177.835932] [c0000003ccbdba40] [0000000000000c00] 0xc00
[ 1177.835937] Task dump for CPU 16:
[ 1177.835941] stapio          R  running task        0  3498   3497 0x00008084
[ 1177.835947] Call Trace:
[ 1177.835951] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00
(unreliable)
[ 1177.835959] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[ 1357.905677] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16}
(detected by 17, t=78024 jiffies, g=615, c=614, q=0)
[ 1357.905702] Task dump for CPU 7:
[ 1357.905706] swapper/7       R  running task        0     0      1 0x00000804
[ 1357.905713] Call Trace:
[ 1357.905718] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[ 1357.905726] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[ 1357.905732] Task dump for CPU 12:
[ 1357.905735] swapper/12      R  running task        0     0      1 0x00000800
[ 1357.905742] Call Trace:
[ 1357.905746] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[ 1357.905753] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[ 1357.905758] Task dump for CPU 13:
[ 1357.905762] swapper/13      R  running task        0     0      1 0x00000804
[ 1357.905768] Call Trace:
[ 1357.905773] [c0000003ccbdb870] [c00000000011b63c]
.find_busiest_group+0x2ec/0x930 (unreliable)
[ 1357.905781] [c0000003ccbdba40] [0000000000000c00] 0xc00
[ 1357.905786] Task dump for CPU 16:
[ 1357.905790] stapio          R  running task        0  3498   3497 0x00008084
[ 1357.905796] Call Trace:
[ 1357.905800] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00
(unreliable)
[ 1357.905808] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
[ 1538.005524] INFO: rcu_sched detected stalls on CPUs/tasks: { 7 12 13 16}
(detected by 17, t=96034 jiffies, g=615, c=614, q=0)
[ 1538.005548] Task dump for CPU 7:
[ 1538.005553] swapper/7       R  running task        0     0      1 0x00000804
[ 1538.005560] Call Trace:
[ 1538.005565] [c0000003ccbc3870] [0000006d368b1ce6] 0x6d368b1ce6 (unreliable)
[ 1538.005573] [c0000003ccbc3a40] [0000000000000c00] 0xc00
[ 1538.005578] Task dump for CPU 12:
[ 1538.005582] swapper/12      R  running task        0     0      1 0x00000800
[ 1538.005589] Call Trace:
[ 1538.005592] [c0000003ccbd7870] [000000592709e938] 0x592709e938 (unreliable)
[ 1538.005600] [c0000003ccbd7a40] [0000000000000c00] 0xc00
[ 1538.005605] Task dump for CPU 13:
[ 1538.005608] swapper/13      R  running task        0     0      1 0x00000804
[ 1538.005615] Call Trace:
[ 1538.005620] [c0000003ccbdb870] [c00000000011b63c]
.find_busiest_group+0x2ec/0x930 (unreliable)
[ 1538.005628] [c0000003ccbdba40] [0000000000000c00] 0xc00
[ 1538.005633] Task dump for CPU 16:
[ 1538.005636] stapio          R  running task        0  3498      1 0x00008084
[ 1538.005643] Call Trace:
[ 1538.005647] [c0000003c43ab540] [c0000003d3ec3d00] 0xc0000003d3ec3d00
(unreliable)
[ 1538.005655] [c0000003c43ab710] [c000000000c50380] decrementers+0x0/0x100
====

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
@ 2014-07-08 19:56 ` dsmith at redhat dot com
  2014-07-09 14:03 ` dsmith at redhat dot com
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-07-08 19:56 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #1 from David Smith <dsmith at redhat dot com> ---
From some more debugging, it appears to be something going on with tracepoints
that start with 'h'. If I change tracepoints.stp to probe
'kernel.trace("[a-gi-z]*")', the testcase works fine.

Here are the tracepoints that start with 'h':

====
# stap -l 'kernel.trace("h*")'
kernel.trace("hcall_entry")
kernel.trace("hcall_exit")
kernel.trace("host1x_cdma_begin")
kernel.trace("host1x_cdma_end")
kernel.trace("host1x_cdma_push")
kernel.trace("host1x_cdma_push_gather")
kernel.trace("host1x_channel_open")
kernel.trace("host1x_channel_release")
kernel.trace("host1x_channel_submit")
kernel.trace("host1x_channel_submit_complete")
kernel.trace("host1x_channel_submitted")
kernel.trace("host1x_syncpt_load_min")
kernel.trace("host1x_syncpt_wait_check")
kernel.trace("host1x_wait_cdma")
kernel.trace("hrtimer_cancel")
kernel.trace("hrtimer_expire_entry")
kernel.trace("hrtimer_expire_exit")
kernel.trace("hrtimer_init")
kernel.trace("hrtimer_start")
====

So, there are basically 3 sets: hcall_*, host1x_*, and hrtimer_*. Each of those
sets works fine individually. I haven't quite found the combination that causes
the stall.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
  2014-07-08 19:56 ` [Bug runtime/17126] " dsmith at redhat dot com
@ 2014-07-09 14:03 ` dsmith at redhat dot com
  2014-07-09 21:02 ` dsmith at redhat dot com
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-07-09 14:03 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #2 from David Smith <dsmith at redhat dot com> ---
After some more testing, it appears that any combination of the hrtimer_*
tracepoints with the hcall_* tracepoints cause the stall/hang.

The hcall_* tracepoints are ppc64 specific. I believe 'hcall' is short for
hypervisor call.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
  2014-07-08 19:56 ` [Bug runtime/17126] " dsmith at redhat dot com
  2014-07-09 14:03 ` dsmith at redhat dot com
@ 2014-07-09 21:02 ` dsmith at redhat dot com
  2014-07-10 15:44 ` dsmith at redhat dot com
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-07-09 21:02 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #3 from David Smith <dsmith at redhat dot com> ---
Using "perf" (via the following command) doesn't show the stall.

     # perf record -e 'timer:hrtimer_*' -e 'powerpc:hcall_*' -a ls

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (2 preceding siblings ...)
  2014-07-09 21:02 ` dsmith at redhat dot com
@ 2014-07-10 15:44 ` dsmith at redhat dot com
  2014-07-21  9:08 ` hemkshaw at in dot ibm.com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-07-10 15:44 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #4 from David Smith <dsmith at redhat dot com> ---
Using "stap -DSTP_ALIBI" (which ifdef's out probe bodies) doesn't show the
stall either. This means that something the probe body is doing is causing the
hang.

Over in __trace_hcall_entry() (in arch/powerpc/platforms/pseries/lpar.c), I see
the following comment:

    /*
     * We cannot call tracepoints inside RCU idle regions which
     * means we must not trace H_CEDE.
     */

I wonder if a specific hcall opcode triggers this stall.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (3 preceding siblings ...)
  2014-07-10 15:44 ` dsmith at redhat dot com
@ 2014-07-21  9:08 ` hemkshaw at in dot ibm.com
  2014-07-21 11:38 ` fche at redhat dot com
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: hemkshaw at in dot ibm.com @ 2014-07-21  9:08 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

hemkshaw at in dot ibm.com <hemkshaw at in dot ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hemkshaw at in dot ibm.com

--- Comment #5 from hemkshaw at in dot ibm.com <hemkshaw at in dot ibm.com> ---
Hi,
I am trying to reproduce this bug on a RHEL7 system. But due to some reasons,
stap is not able to find the trace points related to hcall.
# uname -r
3.10.0-123.el7.ppc64

And the debug and devel pkgs for this kernel are also installed :
# rpm -qa | grep kernel-de
kernel-devel-3.10.0-123.el7.ppc64
kernel-debug-3.10.0-123.el7.ppc64

But,
# stap -l 'kernel.trace("hcall_*")'
shows nothing!

Although, perf is able to show them,
# perf list tracepoints | grep hcall
  powerpc:hcall_entry                                [Tracepoint event]
  powerpc:hcall_exit                                 [Tracepoint event]

However, stap is able to show the events related to hrtimer and is also able to
probe into these events, viz.

# stap -l 'kernel.trace("hrtimer_*")'
kernel.trace("hrtimer_cancel")
kernel.trace("hrtimer_expire_entry")
kernel.trace("hrtimer_expire_exit")
kernel.trace("hrtimer_init")
kernel.trace("hrtimer_start")

Since, the problem is due to the combination of probing into hrtimer_* and
hcall_*, I am unable to produce this bug as systemtap is failing to probe into
hcall_* :

# stap -v -e 'probe kernel.trace("hcall_*") { printf("hcall hit!"); }'
Pass 1: parsed user script and 97 library script(s) using
153600virt/35968res/7936shr/24640data kb, in 220usr/10sys/233real ms.
semantic error: while resolving probe point: identifier 'kernel' at <input>:1:7
        source: probe kernel.trace("hcall_*") { printf("hcall hit!"); }
                      ^

semantic error: no match (similar functions: kfree, kmalloc, console, kvm_fpu,
cpu_idle)
Pass 2: analyzed script: 0 probe(s), 0 function(s), 0 embed(s), 0 global(s)
using 164416virt/46144res/11648shr/28608data kb, in 60usr/180sys/243real ms.
Pass 2: analysis failed.  [man error::pass2]

Thanks,
Hemant

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (4 preceding siblings ...)
  2014-07-21  9:08 ` hemkshaw at in dot ibm.com
@ 2014-07-21 11:38 ` fche at redhat dot com
  2014-07-28 12:03 ` hemkshaw at in dot ibm.com
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2014-07-21 11:38 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fche at redhat dot com

--- Comment #6 from Frank Ch. Eigler <fche at redhat dot com> ---
Please install the kernel-debuginfo-3.10.0-123.el7.ppc64 package,
not the kernel-debug-3.10.0-123.el7.ppc64 package.  The former
contains kernel sources and debugging object files; the latter
contains kernel binaries for a completely separate configuration/build.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (5 preceding siblings ...)
  2014-07-21 11:38 ` fche at redhat dot com
@ 2014-07-28 12:03 ` hemkshaw at in dot ibm.com
  2014-07-28 15:00 ` dsmith at redhat dot com
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: hemkshaw at in dot ibm.com @ 2014-07-28 12:03 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #7 from hemkshaw at in dot ibm.com <hemkshaw at in dot ibm.com> ---
Thanks Frank, I installed the kernel-debuginfo and kernel-debuginfo-common
packages for the required kernel.
But, even after installation of kernel-debuginfo-3.10.0-123.el7.ppc64 package,
the problem still persists.

# rpm -qa | grep kernel-debuginfo
kernel-debuginfo-3.10.0-123.el7.ppc64
kernel-debuginfo-common-ppc64-3.10.0-123.el7.ppc64

# stap -l' kernel.trace("hcall_*")'
#

I looked into /proc/kallsysms to find where this hcall related symbols differ
from others, but the hcall related symbols and hrtimer related symbols were
similar. 
The /proc/kallsyms have entries for __tracepoint_ptr_hcall_entry and
__tracepoint_ptr_hrtimer_init.

I strace'd stap to try and find out where actually stap is failing to find the
hcall_* events, but stap is going through all the usual files it needs. So far,
 I didn't notice anything unusual on the strace o/p.

Thanks,
Hemant

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (6 preceding siblings ...)
  2014-07-28 12:03 ` hemkshaw at in dot ibm.com
@ 2014-07-28 15:00 ` dsmith at redhat dot com
  2014-07-29  9:34 ` hemkshaw at in dot ibm.com
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-07-28 15:00 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #8 from David Smith <dsmith at redhat dot com> ---
(In reply to hemkshaw@in.ibm.com from comment #7)
> Thanks Frank, I installed the kernel-debuginfo and kernel-debuginfo-common
> packages for the required kernel.
> But, even after installation of kernel-debuginfo-3.10.0-123.el7.ppc64
> package, the problem still persists.
> 
> # rpm -qa | grep kernel-debuginfo
> kernel-debuginfo-3.10.0-123.el7.ppc64
> kernel-debuginfo-common-ppc64-3.10.0-123.el7.ppc64
> 
> # stap -l' kernel.trace("hcall_*")'
> #

Does "perf list tracepoint" show the hcall tracepoints on your system?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (7 preceding siblings ...)
  2014-07-28 15:00 ` dsmith at redhat dot com
@ 2014-07-29  9:34 ` hemkshaw at in dot ibm.com
  2014-07-29 16:23 ` fche at redhat dot com
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: hemkshaw at in dot ibm.com @ 2014-07-29  9:34 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #9 from hemkshaw at in dot ibm.com <hemkshaw at in dot ibm.com> ---
Hi David,
(In reply to David Smith from comment #8)
> (In reply to hemkshaw@in.ibm.com from comment #7)
> > Thanks Frank, I installed the kernel-debuginfo and kernel-debuginfo-common
[SNIP]
> > # stap -l' kernel.trace("hcall_*")'
> > #
> 
> Does "perf list tracepoint" show the hcall tracepoints on your system?

Yes, perf list tracepoints show the hcall events :

# perf list tracepoints | grep hcall
  powerpc:hcall_entry                                [Tracepoint event]
  powerpc:hcall_exit                                 [Tracepoint event]

Thanks,
Hemant

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (8 preceding siblings ...)
  2014-07-29  9:34 ` hemkshaw at in dot ibm.com
@ 2014-07-29 16:23 ` fche at redhat dot com
  2014-10-20 11:28 ` hemkshaw at in dot ibm.com
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2014-07-29 16:23 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #10 from Frank Ch. Eigler <fche at redhat dot com> ---
Can you run

% stap --poison-cache -k --vp 05000 -L 'kernel.trace("hcall_*")'

and check out the resulting messages & /var/tmp/stapFOO directory,
as to processing of the arch/powerpc/include/asm/trace.h tracepoints?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (9 preceding siblings ...)
  2014-07-29 16:23 ` fche at redhat dot com
@ 2014-10-20 11:28 ` hemkshaw at in dot ibm.com
  2014-10-20 20:18 ` dsmith at redhat dot com
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: hemkshaw at in dot ibm.com @ 2014-10-20 11:28 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #11 from hemkshaw at in dot ibm.com <hemkshaw at in dot ibm.com> ---
Hi,
If I try to reproduce this bug with systemtap-2.6.0 and (upstream)kernel v3.17,
I can't reproduce this.

# cat my_probes.stp
probe kernel.trace("hcall_entry") { printf("hcall_entry\n"); }
probe kernel.trace("hcall_exit") { printf("hcall_exit\n"); }
probe kernel.trace("hrtimer*") { printf("hrtimer_\n"); }

# stap -v my_probes.stp

hrtimer_
hrtimer_
hrtimer_
...

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (10 preceding siblings ...)
  2014-10-20 11:28 ` hemkshaw at in dot ibm.com
@ 2014-10-20 20:18 ` dsmith at redhat dot com
  2014-10-20 20:29 ` dsmith at redhat dot com
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-10-20 20:18 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #12 from David Smith <dsmith at redhat dot com> ---
I just tried tracepoints.exp on 3.10.0-188.el7.ppc64 and HEAD systemtap. I
still get the hang, both with the traceponts.exp testcase and the following
1-line script:

# stap -ve 'probe kernel.trace("hcall_*"), kernel.trace("hrtimer_*") {
printf("%s\n", ppfunc()) }'

Note that however I do get different behavior. With the tracepoints.exp
testcase the system's console reported a backtrace similar to comment #1. With
the 1-liner, I get a hard system hang with nothing reported on the console.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (11 preceding siblings ...)
  2014-10-20 20:18 ` dsmith at redhat dot com
@ 2014-10-20 20:29 ` dsmith at redhat dot com
  2014-10-21 15:46 ` hemkshaw at in dot ibm.com
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-10-20 20:29 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #13 from David Smith <dsmith at redhat dot com> ---
I don't see any changes in __trace_hcall_entry()/__trace_hcall_exit() between
RHEL7's source and the latest kernel sources.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (12 preceding siblings ...)
  2014-10-20 20:29 ` dsmith at redhat dot com
@ 2014-10-21 15:46 ` hemkshaw at in dot ibm.com
  2014-10-21 17:04 ` dsmith at redhat dot com
  2014-11-20 21:33 ` fche at redhat dot com
  15 siblings, 0 replies; 17+ messages in thread
From: hemkshaw at in dot ibm.com @ 2014-10-21 15:46 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #14 from hemkshaw at in dot ibm.com <hemkshaw at in dot ibm.com> ---
(In reply to David Smith from comment #13)
> I don't see any changes in __trace_hcall_entry()/__trace_hcall_exit()
> between RHEL7's source and the latest kernel sources.

Ok, I didn't see this bug with kernel 3.17 (upstream). Although, I was able to
reproduce this bug in RHEL7 ppc64 (3.10.0-123.el7.ppc64). Both with the same
versions of systemtap.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (13 preceding siblings ...)
  2014-10-21 15:46 ` hemkshaw at in dot ibm.com
@ 2014-10-21 17:04 ` dsmith at redhat dot com
  2014-11-20 21:33 ` fche at redhat dot com
  15 siblings, 0 replies; 17+ messages in thread
From: dsmith at redhat dot com @ 2014-10-21 17:04 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

--- Comment #15 from David Smith <dsmith at redhat dot com> ---
(In reply to hemkshaw@in.ibm.com from comment #14)
> (In reply to David Smith from comment #13)
> > I don't see any changes in __trace_hcall_entry()/__trace_hcall_exit()
> > between RHEL7's source and the latest kernel sources.
> 
> Ok, I didn't see this bug with kernel 3.17 (upstream). Although, I was able
> to reproduce this bug in RHEL7 ppc64 (3.10.0-123.el7.ppc64). Both with the
> same versions of systemtap.

I'm glad to know you were able to reproduce it on RHEL7, which I don't think I
knew. Since you didn't see on on 3.17, I noted that I don't see any difference
between the source for the hcall trace functions between the 2 kernels. But it
does potentially sound like a RHEL7 kernel bug if you've used the same version
of systemtap on both the crashing and non-crashing kernels.

Thanks for continuing to look at this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug runtime/17126] tracepoints.exp testcase causing stalls/hang on ppc64
  2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
                   ` (14 preceding siblings ...)
  2014-10-21 17:04 ` dsmith at redhat dot com
@ 2014-11-20 21:33 ` fche at redhat dot com
  15 siblings, 0 replies; 17+ messages in thread
From: fche at redhat dot com @ 2014-11-20 21:33 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=17126

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
           Assignee|systemtap at sourceware dot org    |fche at redhat dot com

--- Comment #16 from Frank Ch. Eigler <fche at redhat dot com> ---
commit c4a048331603 works around the kernel problem via a translator blacklist

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-11-20 21:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-08 14:33 [Bug runtime/17126] New: tracepoints.exp testcase causing stalls/hang on ppc64 dsmith at redhat dot com
2014-07-08 19:56 ` [Bug runtime/17126] " dsmith at redhat dot com
2014-07-09 14:03 ` dsmith at redhat dot com
2014-07-09 21:02 ` dsmith at redhat dot com
2014-07-10 15:44 ` dsmith at redhat dot com
2014-07-21  9:08 ` hemkshaw at in dot ibm.com
2014-07-21 11:38 ` fche at redhat dot com
2014-07-28 12:03 ` hemkshaw at in dot ibm.com
2014-07-28 15:00 ` dsmith at redhat dot com
2014-07-29  9:34 ` hemkshaw at in dot ibm.com
2014-07-29 16:23 ` fche at redhat dot com
2014-10-20 11:28 ` hemkshaw at in dot ibm.com
2014-10-20 20:18 ` dsmith at redhat dot com
2014-10-20 20:29 ` dsmith at redhat dot com
2014-10-21 15:46 ` hemkshaw at in dot ibm.com
2014-10-21 17:04 ` dsmith at redhat dot com
2014-11-20 21:33 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).